Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras CTC Loss input

I'm trying to use CTC for speech recognition using keras and have tried the CTC example here. In that example, the input to the CTC Lambda layer is the output of the softmax layer (y_pred). The Lambda layer calls ctc_batch_cost that internally calls Tensorflow's ctc_loss, but the Tensorflow ctc_loss documentation say that the ctc_loss function performs the softmax internally so you don't need to softmax your input first. I think the correct usage is to pass inner to the Lambda layer so you only apply softmax once in ctc_loss function internally. I have tried the example and it works. Should I follow the example or the Tensorflow documentation?

like image 417
DNK Avatar asked Apr 18 '17 10:04

DNK


People also ask

Is CTC a loss function?

A Connectionist Temporal Classification Loss, or CTC Loss, is designed for tasks where we need alignment between sequences, but where that alignment is difficult - e.g. aligning each character to its location in an audio file. It calculates a loss between a continuous (unsegmented) time series and a target sequence.

What is CTC ASR?

Connectionist temporal classification (CTC) ASR decoding is mainly composed of two major steps: the mapping and the searching. In the mapping, we map the acoustic information of an audio frame to a triphone state. This is the alignment process. This is a many-to-one mapping.

What does CTC method in speech recognition do?

CTC is an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems. CTC is used when we don't know how the input aligns with the output (how the characters in the transcript align to the audio).

What is SortaGrad?

SortaGrad uses the length of the utterance. as a heuristic for difficulty, since long utterances have higher cost than short utterances.


1 Answers

The loss used in the code you posted is different from the one you linked. The loss used in the code is found here

The keras code peforms some pre-processing before calling the ctc_loss that makes it suitable for the format required. On top of requiring the input to be not softmax-ed, tensorflow's ctc_loss also expects the dims to be NUM_TIME, BATCHSIZE, FEATURES. Keras's ctc_batch_cost does both of these things in this line.

It does log() which gets rid of the softmax scaling and it also shuffles the dims so that its in the right shape. When I say gets rid of softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x). See below:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()


x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####
like image 50
Prophecies Avatar answered Oct 01 '22 22:10

Prophecies