All, When you train a large model with large amount samples, some samples may be cause NaN gradient when parameter updating. And I want to find these samples out. And meanwhile I don't want this batch samples' gradient to update model's parameter, because it may be cause model's parameter being NaN. So dose anyone have good idea to deal with this problem? My code is like below: <pre class="prettyprint"><code> # Create an optimizer. params = tf.trainable_variables() opt = tf.train.AdamOptimizer(1e-3) gradients = tf.gradients(self.loss, params) max_gradient_norm = 10 clipped_gradients, self.gradient_norms = tf.clip_by_global_norm(gradients, max_gradient_norm) self.optimizer = opt.apply_gradients(zip(clipped_gradients, params)) </code></pre>

You can check whether your gradients have NaN by <code>tf.check_numerics</code>: <pre class="prettyprint"><code>grad_check = tf.check_numerics(clipped_gradients) with tf.control_dependencies([grad_check]): self.optimizer = opt.apply_gradients(zip(clipped_gradients, params)) </code></pre> The <code>grad_check</code> would throw <code>InvalidArgument</code> if clipped_gradients is NaN or infinity. The <code>tf.control_dependencies</code> makes sure that the <code>grad_check</code> is evaluated before applying the gradients. Also see <code>tf.add_check_numerics_ops()</code>.

How to check NaN in gradients in Tensorflow when updating?

Tags:

machine-learning

tensorflow

mathematical-optimization

deep-learning

All,

When you train a large model with large amount samples, some samples may be cause NaN gradient when parameter updating.

And I want to find these samples out. And meanwhile I don't want this batch samples' gradient to update model's parameter, because it may be cause model's parameter being NaN.

So dose anyone have good idea to deal with this problem?

My code is like below:

    # Create an optimizer.
    params = tf.trainable_variables()
    opt = tf.train.AdamOptimizer(1e-3)
    gradients = tf.gradients(self.loss, params)

    max_gradient_norm = 10
    clipped_gradients, self.gradient_norms = tf.clip_by_global_norm(gradients,
                                                     max_gradient_norm)

    self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))

630

asked Nov 20 '16 07:11

Issac

1 Answers

You can check whether your gradients have NaN by tf.check_numerics:

grad_check = tf.check_numerics(clipped_gradients)
with tf.control_dependencies([grad_check]):
  self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))

The grad_check would throw InvalidArgument if clipped_gradients is NaN or infinity.

The tf.control_dependencies makes sure that the grad_check is evaluated before applying the gradients.

Also see tf.add_check_numerics_ops().

137

answered Nov 10 '22 06:11

yuefengz

Related questions
                            
                                Using the Apache Mahout machine learning libraries [closed]
                            
                                What subjects, topics does a computer science graduate need to learn to apply available machine learning frameworks, esp. SVMs
                            
                                Plotting a decision boundary in matlab
                            
                                Simple gradient boosting algorithm
                            
                                Sparse coding in Python [closed]
                            
                                Persisting data in sklearn
                            
                                Probability basics for machine learning [closed]
                            
                                How do I cluster with KL-divergence?
                            
                                what is the difference between the stacking grading, and voting algorithms?
                            
                                Scikit learn - How to use SVM and Random Forest for text classification?
                            
                                Training SVM with variable sized hog descriptors of training images (MATLAB)
                            
                                What is wrong with my Gradient Descent algorithm
                            
                                scikit weighted f1 score calculation and usage
                            
                                Tensorflow classification with extremely unbalanced dataset
                            
                                Update a subset of weights in TensorFlow
                            
                                Python scikit svm "ValueError: X has 62 features per sample; expecting 337"
                            
                                Adaboost with neural networks
                            
                                Why the decision tree structure is only binary tree for sklearn DecisionTreeClassifier?
                            
                                Label smoothing (soft targets) in Pandas
                            
                                Local and global minima of the cost function in logistic regression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With