Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve nan loss?

Problem

I'm running a Deep Neural Network on the MNIST where the loss defined as follow:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, label))

The program seems to run correctly until I get a nan loss in the 10000+ th minibatch. Sometimes, the program runs correctly until it finished. I think tf.nn.softmax_cross_entropy_with_logits is giving me this error. This is strange, because the code just contains mul and add operations.

Possible Solution

Maybe I can use:

if cost == "nan":
  optimizer = an empty optimizer 
else:
  ...
  optimizer = real optimizer

But I cannot find the type of nan. How can I check a variable is nan or not?

How else can I solve this problem?

like image 368
Swind D.C. Xu Avatar asked Oct 20 '16 15:10

Swind D.C. Xu


1 Answers

I find a similar problem here TensorFlow cross_entropy NaN problem

Thanks to the author user1111929

tf.nn.softmax_cross_entropy_with_logits => -tf.reduce_sum(y_*tf.log(y_conv))

is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.

Replacing it with

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv + 1e-10))

Or

cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))

Solved nan problem.

like image 179
demianzhang Avatar answered Oct 28 '22 21:10

demianzhang