Keras + Tensorflow : Debug NaNs

Question

Here is a great question on how to find the first occurence of Nan in a tensorflow graph:

Debugging nans in the backward pass

The answer is quite helpful, here is the code from it:

train_op = ...
check_op = tf.add_check_numerics_ops()

sess = tf.Session()
sess.run([train_op, check_op])  # Runs training and checks for NaNs

Apparently, running the training and the numerical check at the same time will result in an error report as soon as Nan is encountered for the first time.

How do I integrate this into Keras ? In the documentation, I can't find anything that looks like this.

I checked the code, too. The update step is executed here: https://github.com/fchollet/keras/blob/master/keras/engine/training.py

There is a function called _make_train_function where an operation to compute the loss and apply updates is created. This is later called to train the network.

I could change the code like this (always assuming that we're running on a tf backend):

check_op = tf.add_check_numerics_ops()

self.train_function = K.function(inputs, 
    [self.total_loss] + self.metrics_tensors + [check_op],
    updates=updates, name='train_function', **self._function_kwargs)

I'm currently trying to set this up properly and not sure whether the code above actually works. Maybe there is an easier way ?

Manas George · Accepted Answer

I've been running into the exact same problem, and found an alternative to the check_add_numerics_ops() function. Instead of going that route, I use the TensorFlow Debugger to walk through my model, following the example in https://www.tensorflow.org/guide/debugger to figure out exactly where my code produces nans. This snippet should work for replacing the TensorFlow Session that Keras is using with a debugging session, allowing you to use tfdbg.

from tensorflow.python import debug as tf_debug
sess = K.get_session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
K.set_session(sess)

Keras + Tensorflow : Debug NaNs

Tags:

python

machine-learning

neural-network

tensorflow

keras

lhk

1 Answers

Manas George

Recent Activity

Donate For Us

Keras + Tensorflow : Debug NaNs

Tags:

python

machine-learning

neural-network

tensorflow

keras

lhk

1 Answers

Manas George

Related questions

Recent Activity

Donate For Us