Training a model with tf.nn.ctc_loss
produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI
would require input length of at least 4 while the label BIIF
would require input length of at least 5 due to a blank being inserted between the repeated symbols.
I had the same issue but I soon realized it was just because I was using glob
and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))
For me the problem was fixed by setting preprocess_collapse_repeated=True
.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With