Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras Neural Nets, How to remove NaN values in output? [duplicate]

I've noticed that a frequent occurrence during training is NANs being introduced.

Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up.

Is this occurring because the gradient computation is blowing up? Or is it because of weight initialization (if so, why does weight initialization have this effect)? Or is it likely caused by the nature of the input data?

The overarching question here is simply: What is the most common reason for NANs to occurring during training? And secondly, what are some methods for combatting this (and why do they work)?

like image 356
Aidan Gomez Avatar asked Nov 27 '15 17:11

Aidan Gomez


3 Answers

This answer is not about a cause for nans, but rather proposes a way to help debug it. You can have this python layer:

class checkFiniteLayer(caffe.Layer):
  def setup(self, bottom, top):
    self.prefix = self.param_str
  def reshape(self, bottom, top):
    pass
  def forward(self, bottom, top):
    for i in xrange(len(bottom)):
      isbad = np.sum(1-np.isfinite(bottom[i].data[...]))
      if isbad>0:
        raise Exception("checkFiniteLayer: %s forward pass bottom %d has %.2f%% non-finite elements" %
                        (self.prefix,i,100*float(isbad)/bottom[i].count))
  def backward(self, top, propagate_down, bottom):
    for i in xrange(len(top)):
      if not propagate_down[i]:
        continue
      isf = np.sum(1-np.isfinite(top[i].diff[...]))
        if isf>0:
          raise Exception("checkFiniteLayer: %s backward pass top %d has %.2f%% non-finite elements" %
                          (self.prefix,i,100*float(isf)/top[i].count))

Adding this layer into your train_val.prototxt at certain points you suspect may cause trouble:

layer {
  type: "Python"
  name: "check_loss"
  bottom: "fc2"
  top: "fc2"  # "in-place" layer
  python_param {
    module: "/path/to/python/file/check_finite_layer.py" # must be in $PYTHONPATH
    layer: "checkFiniteLayer"
    param_str: "prefix-check_loss" # string for printouts
  }
}
like image 125
Shai Avatar answered Jan 06 '23 12:01

Shai


In my case, not setting the bias in the convolution/deconvolution layers was the cause.

Solution: add the following to the convolution layer parameters.

bias_filler {
      type: "constant"
      value: 0
    }
like image 20
izady Avatar answered Jan 06 '23 12:01

izady


learning_rate is high and should be decreased The accuracy in the RNN code was nan, with select the low value for learning rate it fixes

like image 40
Mohammad Rasoul tanhatalab Avatar answered Jan 06 '23 12:01

Mohammad Rasoul tanhatalab