Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

Tags:

I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.

First of all, anyone know where can I read a good step-by-step tutorial? Tensorflow's documentation is very poor on this topic.
Do I have to provide to ctc_loss the labels with the blank label interleaved or not?
I could not be able to overfit my network even using a train dataset of length 1 over 200 epochs. :(
How can I calculate the label error rate using tf.edit_distance?

Here is my code:

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1, 
  # [batch_size, max_time, cell.output_size]) and the final state of shape
  # [batch_size, cell.state_size]
  y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), #  num_proj=num_classes
    data,
    dtype=tf.float32,
    sequence_length=data_length,
  )

  #  For sequence labelling, we want a prediction for each timestamp. 
  #  However, we share the weights for the softmax layer across all timesteps. 
  #  How do we do that? By flattening the first two dimensions of the output tensor. 
  #  This way time steps look the same as examples in the batch to the weight matrix. 
  #  Afterwards, we reshape back to the desired shape


  # Reshaping
  logits = tf.transpose(y_rnn1, perm=(1, 0, 2))

  #  Get the loss by calculating ctc_loss
  #  Also calculates
  #  the gradient.  This class performs the softmax operation for you, so    inputs
  #  should be e.g. linear projections of outputs by an LSTM.
  loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))

  #  Define our optimizer with learning rate
  optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

  #  Decoding using beam search
  decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

Thanks!

Update (06/29/2016)

Thank you, @jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?

This is the code:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

Update (07/11/2016)

Thank you @Xiv. Here is the code after the bug fix:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
    self._logits = tf.transpose(self._logits, (1,0,2))

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

Update (07/25/16)

I published on GitHub part of my code, working with one utterance. Feel free to use! :)

799

asked Jun 27 '16 16:06

Igor Macedo Quintanilha

1 Answers

I'm trying to do the same thing. Here's what I found you may be interested in.

It was really hard to find the tutorial for CTC, but this example was helpful.

And for the blank label, CTC layer assumes that the blank index is num_classes - 1, so you need to provide an additional class for the blank label.

Also, CTC network performs softmax layer. In your code, RNN layer is connected to CTC loss layer. Output of RNN layer is internally activated, so you need to add one more hidden layer (it could be output layer) without activation function, then add CTC loss layer.

160

answered Oct 02 '22 14:10

J. Seo

Related questions
                            
                                What are symbolic tensors in TensorFlow and Keras?
                            
                                How do you add new categories and training to a pretrained Inception v3 model in TensorFlow?
                            
                                Is the L1 regularization in Keras/Tensorflow *really* L1-regularization?
                            
                                How do I use Batch Normalization during test time in Keras?
                            
                                shape Detection - TensorFlow
                            
                                Is making multiple shards of your data with multiple threads minimize the training time?
                            
                                How to download graphs from tensorboard?
                            
                                Importing tensorflow makes python 3.6.5 error
                            
                                Bounding boxes using tensorflow and inception-v3
                            
                                How to save estimator in Tensorflow for later use?
                            
                                Get Gradients with Keras Tensorflow 2.0
                            
                                Validation and Test with TensorFlow
                            
                                No broadcasting for tf.matmul in TensorFlow
                            
                                Edit tensorflow inceptionV3 retraining-example.py for multiple classificiations
                            
                                Batch_size in tensorflow? Understanding the concept
                            
                                LSTMStateTuple vs cell.zero_state() for RNN in Tensorflow
                            
                                What is the difference between tf.estimator.Estimator and tf.contrib.learn.Estimator in TensorFlow
                            
                                Keras Model saving erroring: TypeError: get_config() missing 1 required positional argument: 'self'
                            
                                How does Tensorflow build() work from tf.keras.layers.Layer
                            
                                Pure Java/Scala code for writing Tensorflow TFRecords data file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

Tags:

tensorflow

speech-recognition

end-to-end

ctc

Igor Macedo Quintanilha

People also ask

1 Answers

J. Seo

Recent Activity

Donate For Us