variable-length rnn padding and mask out padding gradients

Question

I'm building an rnn and using the sequene_length parameter to supply a list of lengths for sequences in a batch, and all of sequences in a batch are padded to the same length.

However, when doing backprop, is it possible to mask out the gradients corresponding to the padded steps, so these steps would have 0 contribution to the weight updates? I'm already masking out their corresponding costs like this (where batch_weights is a vector of 0's and 1's, where the elements corresponding to the padding steps are 0's):

loss = tf.mul(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.reshape(self._targets, [-1])), batch_weights)

self._cost = cost = tf.reduce_sum(loss) / tf.to_float(tf.reduce_sum(batch_weights))

the problem is I'm not sure by doing the above whether the gradients from the padding steps are zeroed out or not?

Albert · Accepted Answer

For all framewise / feed-forward (non-recurrent) operations, masking the loss/cost is enough.

For all sequence / recurrent operations (e.g. dynamic_rnn), there is always a sequence_length parameter which you need to set to the corresponding sequence lengths. Then there wont be a gradient for the zero-padded steps, or in other terms, it will have 0 contribution.

variable-length rnn padding and mask out padding gradients

Tags:

tensorflow

nddk

1 Answers

Albert

Recent Activity

Donate For Us

variable-length rnn padding and mask out padding gradients

Tags:

tensorflow

nddk

1 Answers

Albert

Related questions

Recent Activity

Donate For Us