Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.

  • First of all, anyone know where can I read a good step-by-step tutorial? Tensorflow's documentation is very poor on this topic.
  • Do I have to provide to ctc_loss the labels with the blank label interleaved or not?
  • I could not be able to overfit my network even using a train dataset of length 1 over 200 epochs. :(
  • How can I calculate the label error rate using tf.edit_distance?

Here is my code:

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1, 
  # [batch_size, max_time, cell.output_size]) and the final state of shape
  # [batch_size, cell.state_size]
  y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), #  num_proj=num_classes
    data,
    dtype=tf.float32,
    sequence_length=data_length,
  )

  #  For sequence labelling, we want a prediction for each timestamp. 
  #  However, we share the weights for the softmax layer across all timesteps. 
  #  How do we do that? By flattening the first two dimensions of the output tensor. 
  #  This way time steps look the same as examples in the batch to the weight matrix. 
  #  Afterwards, we reshape back to the desired shape


  # Reshaping
  logits = tf.transpose(y_rnn1, perm=(1, 0, 2))

  #  Get the loss by calculating ctc_loss
  #  Also calculates
  #  the gradient.  This class performs the softmax operation for you, so    inputs
  #  should be e.g. linear projections of outputs by an LSTM.
  loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))

  #  Define our optimizer with learning rate
  optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

  #  Decoding using beam search
  decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

Thanks!

Update (06/29/2016)

Thank you, @jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?

This is the code:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

Update (07/11/2016)

Thank you @Xiv. Here is the code after the bug fix:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
    self._logits = tf.transpose(self._logits, (1,0,2))

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

Update (07/25/16)

I published on GitHub part of my code, working with one utterance. Feel free to use! :)

like image 799
Igor Macedo Quintanilha Avatar asked Jun 27 '16 16:06

Igor Macedo Quintanilha


People also ask

How does CTC algorithm work?

The CTC algorithm is alignment-free — it doesn't require an alignment between the input and the output. However, to get the probability of an output given an input, CTC works by summing over the probability of all possible alignments between the two.

What is CTC machine learning?

Connectionist Temporal Classification (CTC) is a type of Neural Network output helpful in tackling sequence problems like handwriting and speech recognition where the timing varies. Using CTC ensures that one does not need an aligned dataset, which makes the training process more straightforward.

What does CTC method in speech recognition do?

CTC is an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems. CTC is used when we don't know how the input aligns with the output (how the characters in the transcript align to the audio).

What is CTC decoder?

ctcdecode is an implementation of CTC (Connectionist Temporal Classification) beam search decoding for PyTorch. C++ code borrowed liberally from Paddle Paddles' DeepSpeech. It includes swappable scorer support enabling standard beam search, and KenLM-based decoding.


1 Answers

I'm trying to do the same thing. Here's what I found you may be interested in.

It was really hard to find the tutorial for CTC, but this example was helpful.

And for the blank label, CTC layer assumes that the blank index is num_classes - 1, so you need to provide an additional class for the blank label.

Also, CTC network performs softmax layer. In your code, RNN layer is connected to CTC loss layer. Output of RNN layer is internally activated, so you need to add one more hidden layer (it could be output layer) without activation function, then add CTC loss layer.

like image 160
J. Seo Avatar answered Oct 02 '22 14:10

J. Seo