My question and problem is stated below the two blocks of code.
def loss(labels, logits, sequence_lengths, label_lengths, logit_lengths):
scores = []
for i in xrange(runner.batch_size):
sequence_length = sequence_lengths[i]
for j in xrange(length):
label_length = label_lengths[i, j]
logit_length = logit_lengths[i, j]
# get top k indices <==> argmax_k(labels[i, j, 0, :], label_length)
top_labels = np.argpartition(labels[i, j, 0, :], -label_length)[-label_length:]
top_logits = np.argpartition(logits[i, j, 0, :], -logit_length)[-logit_length:]
scores.append(edit_distance(top_labels, top_logits))
return np.mean(scores)
# Levenshtein distance
def edit_distance(s, t):
n = s.size
m = t.size
d = np.zeros((n+1, m+1))
d[:, 0] = np.arrange(n+1)
d[0, :] = np.arrange(n+1)
for j in xrange(1, m+1):
for i in xrange(1, n+1):
if s[i] == t[j]:
d[i, j] = d[i-1, j-1]
else:
d[i, j] = min(d[i-1, j] + 1,
d[i, j-1] + 1,
d[i-1, j-1] + 1)
return d[m, n]
I've tried to flatten my code so that everything is happening in one place. Let me know if there are typos/points of confusion.
sequence_lengths_placeholder = tf.placeholder(tf.int64, shape=(batch_size))
labels_placeholder = tf.placeholder(tf.float32, shape=(batch_size, max_feature_length, label_size))
label_lengths_placeholder = tf.placeholder(tf.int64, shape=(batch_size, max_feature_length))
loss_placeholder = tf.placeholder(tf.float32, shape=(1))
logit_W = tf.Variable(tf.zeros([lstm_units, label_size]))
logit_b = tf.Variable(tf.zeros([label_size]))
length_W = tf.Variable(tf.zeros([lstm_units, max_length]))
length_b = tf.Variable(tf.zeros([max_length]))
lstm = rnn_cell.BasicLSTMCell(lstm_units)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * layer_count)
rnn_out, state = rnn.rnn(stacked_lstm, features, dtype=tf.float32, sequence_length=sequence_lengths_placeholder)
logits = tf.concat(1, [tf.reshape(tf.matmul(t, logit_W) + logit_b, [batch_size, 1, 2, label_size]) for t in rnn_out])
logit_lengths = tf.concat(1, [tf.reshape(tf.matmul(t, length_W) + length_b, [batch_size, 1, max_length]) for t in rnn_out])
optimizer = tf.train.AdamOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss_placeholder, global_step=global_step)
...
...
# Inside training loop
np_labels, np_logits, sequence_lengths, label_lengths, logit_lengths = sess.run([labels_placeholder, logits, sequence_lengths_placeholder, label_lengths_placeholder, logit_lengths], feed_dict=feed_dict)
loss = loss(np_labels, np_logits, sequence_lengths, label_lengths, logit_lengths)
_ = sess.run([train_op], feed_dict={loss_placeholder: loss})
The issue is that this is returning the error:
File "runner.py", line 63, in <module>
train_op = optimizer.minimize(loss_placeholder, global_step=global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 188, in minimize
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 277, in apply_gradients
(grads_and_vars,))
ValueError: No gradients provided for any variable: <all my variables>
So I assume that this is TensorFlow complaining that it can't compute the gradients of my loss because the loss is performed by numpy, outside the scope of TF.
So naturally to fix that I would try and implement this in TensorFlow. The issue is, my logit_lengths
and label_lengths
are both Tensors, so when I try and access a single element, I'm returned a Tensor of shape []. This is an issue when I'm trying to use tf.nn.top_k()
which takes an Int
for its k
parameter.
Another issue with that is my label_lengths
is a Placeholder and since my loss
value need to be defined before the optimizer.minimize(loss)
call, I also get an error that says a value needs to be passed for the placeholder.
I'm just wondering how I could try and implement this loss function. Or if I'm missing something obvious.
Edit: After some further reading I see that usually losses like the one I describe are used in validation and in training a surrogate loss that minimizes in the same place as the true loss is used. Does anyone know what surrogate loss is used for an edit distance based scenario like mine?
The first thing I would do is to calculate loss using tensorflow instead of numpy. That will allow tensorflow to compute gradients for you, so you will be able to back-propagate, meaning you can minimize the loss.
There is tf.edit_distance(https://www.tensorflow.org/api_docs/python/tf/edit_distance) function in the core library.
So naturally to fix that I would try and implement this in TensorFlow. The issue is, my logit_lengths and label_lengths are both Tensors, so when I try and access a single element, I'm returned a Tensor of shape []. This is an issue when I'm trying to use tf.nn.top_k() which takes an Int for its k parameter.
Could you provide a little bit more details why it is an issue?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With