Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

variable-length rnn padding and mask out padding gradients

Tags:

tensorflow

I'm building an rnn and using the sequene_length parameter to supply a list of lengths for sequences in a batch, and all of sequences in a batch are padded to the same length.

However, when doing backprop, is it possible to mask out the gradients corresponding to the padded steps, so these steps would have 0 contribution to the weight updates? I'm already masking out their corresponding costs like this (where batch_weights is a vector of 0's and 1's, where the elements corresponding to the padding steps are 0's):

loss = tf.mul(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.reshape(self._targets, [-1])), batch_weights)

self._cost = cost = tf.reduce_sum(loss) / tf.to_float(tf.reduce_sum(batch_weights))

the problem is I'm not sure by doing the above whether the gradients from the padding steps are zeroed out or not?

like image 636
nddk Avatar asked Mar 01 '16 19:03

nddk


1 Answers

For all framewise / feed-forward (non-recurrent) operations, masking the loss/cost is enough.

For all sequence / recurrent operations (e.g. dynamic_rnn), there is always a sequence_length parameter which you need to set to the corresponding sequence lengths. Then there wont be a gradient for the zero-padded steps, or in other terms, it will have 0 contribution.

like image 127
Albert Avatar answered Oct 12 '22 14:10

Albert