I'm trying to use CTC for speech recognition using keras and have tried the CTC example here. In that example, the input to the CTC <code>Lambda</code> layer is the output of the softmax layer (<code>y_pred</code>). The <code>Lambda</code> layer calls <code>ctc_batch_cost</code> that internally calls Tensorflow's <code>ctc_loss</code>, but the Tensorflow <code>ctc_loss</code> documentation say that the <code>ctc_loss</code> function performs the softmax internally so you don't need to softmax your input first. I think the correct usage is to pass <code>inner</code> to the <code>Lambda</code> layer so you only apply softmax once in <code>ctc_loss</code> function internally. I have tried the example and it works. Should I follow the example or the Tensorflow documentation?

The loss used in the code you posted is different from the one you linked. The loss used in the code is found here The keras code peforms some pre-processing before calling the <code>ctc_loss</code> that makes it suitable for the format required. On top of requiring the input to be not softmax-ed, tensorflow's <code>ctc_loss</code> also expects the dims to be <code>NUM_TIME, BATCHSIZE, FEATURES</code>. Keras's <code>ctc_batch_cost</code> does both of these things in this line. It does log() which gets rid of the softmax scaling and it also shuffles the dims so that its in the right shape. When I say gets rid of softmax scaling, it obviously does not restore the original tensor, but rather <code>softmax(log(softmax(x))) = softmax(x)</code>. See below: <pre class="prettyprint"><code>def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() x = [1,2,3] y = softmax(x) z = np.log(y) # z =/= x (obviously) BUT yp = softmax(z) # yp = y ##### </code></pre>

Keras CTC Loss input

Tags:

tensorflow

keras

I'm trying to use CTC for speech recognition using keras and have tried the CTC example here. In that example, the input to the CTC Lambda layer is the output of the softmax layer (y_pred). The Lambda layer calls ctc_batch_cost that internally calls Tensorflow's ctc_loss, but the Tensorflow ctc_loss documentation say that the ctc_loss function performs the softmax internally so you don't need to softmax your input first. I think the correct usage is to pass inner to the Lambda layer so you only apply softmax once in ctc_loss function internally. I have tried the example and it works. Should I follow the example or the Tensorflow documentation?

417

asked Apr 18 '17 10:04

DNK

1 Answers

The loss used in the code you posted is different from the one you linked. The loss used in the code is found here

The keras code peforms some pre-processing before calling the ctc_loss that makes it suitable for the format required. On top of requiring the input to be not softmax-ed, tensorflow's ctc_loss also expects the dims to be NUM_TIME, BATCHSIZE, FEATURES. Keras's ctc_batch_cost does both of these things in this line.

It does log() which gets rid of the softmax scaling and it also shuffles the dims so that its in the right shape. When I say gets rid of softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x). See below:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()


x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####

answered Oct 01 '22 22:10

Prophecies

Related questions
                            
                                Print layer outputs in Keras during training
                            
                                Compute gradients for each time step of tf.while_loop
                            
                                Can tensorflow sess.run() really release GIL (global interpreter look) of python?
                            
                                Keras: Masking and Flattening
                            
                                Is tf.GradientTape in TF 2.0 equivalent to tf.gradients?
                            
                                Does TensorFlow's `sample_from_datasets` still sample from a Dataset when getting a `DirectedInterleave selected an exhausted input` warning?
                            
                                Upsampling feature maps in TensorFlow
                            
                                Tensorflow - Testing a mnist neural net with my own images
                            
                                Filter out non-zero values in a tensor
                            
                                How to implement a matrix multiplication in Keras?
                            
                                How to disable keras warnings?
                            
                                How to use Keras TensorBoard callback for grid search
                            
                                Results not reproducible with Keras and TensorFlow in Python
                            
                                How to install tensorflow GPU version on VirtualBox Ubuntu OS. And host OS is windows 10
                            
                                Reusing a group of Keras layers
                            
                                Imbalanced classes in multi-class classification problem
                            
                                Logistic Regression using Tensorflow 2.0?
                            
                                What is the purpose of weights and biases in tensorflow word2vec example?
                            
                                TensorFlow pip installation issue: cannot import name 'descriptor'
                            
                                GPU utilization mostly 0% during training

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With