All,
When you train a large model with large amount samples, some samples may be cause NaN gradient when parameter updating.
And I want to find these samples out. And meanwhile I don't want this batch samples' gradient to update model's parameter, because it may be cause model's parameter being NaN.
So dose anyone have good idea to deal with this problem?
My code is like below:
# Create an optimizer.
params = tf.trainable_variables()
opt = tf.train.AdamOptimizer(1e-3)
gradients = tf.gradients(self.loss, params)
max_gradient_norm = 10
clipped_gradients, self.gradient_norms = tf.clip_by_global_norm(gradients,
max_gradient_norm)
self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))
And because of the way tensorflow works (which computes the gradients using the chain rule) it results in nan s or +/-Inf s. The best way probably would be for tensorflow to detect these patterns and replace them with their analytically-simplified equivalent.
Tensorflow tf.where () function can allow us to filter or remove some values from a tensor. For example, we can remove nan or inf value from a tensor. In order to understand how to use tf.where (), you can view: However, when we are using tf.where () to filer tensor, we may cause gradient nan error.
In short, if the input to a tf.where contains NaNs, the gradient will always be NaN, regardless whether the input is actually used or not, and the workaround is to prevent the inputs from ever containing NaNs. Sorry, something went wrong. Are you satisfied with the resolution of your issue? Sorry, something went wrong.
Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.
You can check whether your gradients have NaN by tf.check_numerics
:
grad_check = tf.check_numerics(clipped_gradients)
with tf.control_dependencies([grad_check]):
self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))
The grad_check
would throw InvalidArgument
if clipped_gradients is NaN or infinity.
The tf.control_dependencies
makes sure that the grad_check
is evaluated before applying the gradients.
Also see tf.add_check_numerics_ops()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With