Keras Neural Nets, How to remove NaN values in output? [duplicate]

Question

I've noticed that a frequent occurrence during training is NANs being introduced.

Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up.

Is this occurring because the gradient computation is blowing up? Or is it because of weight initialization (if so, why does weight initialization have this effect)? Or is it likely caused by the nature of the input data?

The overarching question here is simply: What is the most common reason for NANs to occurring during training? And secondly, what are some methods for combatting this (and why do they work)?

Shai · Accepted Answer

This answer is not about a cause for nans, but rather proposes a way to help debug it. You can have this python layer:

class checkFiniteLayer(caffe.Layer):
  def setup(self, bottom, top):
    self.prefix = self.param_str
  def reshape(self, bottom, top):
    pass
  def forward(self, bottom, top):
    for i in xrange(len(bottom)):
      isbad = np.sum(1-np.isfinite(bottom[i].data[...]))
      if isbad>0:
        raise Exception("checkFiniteLayer: %s forward pass bottom %d has %.2f%% non-finite elements" %
                        (self.prefix,i,100*float(isbad)/bottom[i].count))
  def backward(self, top, propagate_down, bottom):
    for i in xrange(len(top)):
      if not propagate_down[i]:
        continue
      isf = np.sum(1-np.isfinite(top[i].diff[...]))
        if isf>0:
          raise Exception("checkFiniteLayer: %s backward pass top %d has %.2f%% non-finite elements" %
                          (self.prefix,i,100*float(isf)/top[i].count))

Adding this layer into your train_val.prototxt at certain points you suspect may cause trouble:

layer {
  type: "Python"
  name: "check_loss"
  bottom: "fc2"
  top: "fc2"  # "in-place" layer
  python_param {
    module: "/path/to/python/file/check_finite_layer.py" # must be in $PYTHONPATH
    layer: "checkFiniteLayer"
    param_str: "prefix-check_loss" # string for printouts
  }
}

izady · Answer

In my case, not setting the bias in the convolution/deconvolution layers was the cause.

Solution: add the following to the convolution layer parameters.

bias_filler {
      type: "constant"
      value: 0
    }

Mohammad Rasoul tanhatalab · Answer

learning_rate is high and should be decreased The accuracy in the RNN code was nan, with select the low value for learning rate it fixes

Keras Neural Nets, How to remove NaN values in output? [duplicate]

Tags:

machine-learning

neural-network

deep-learning

gradient-descent

caffe

Aidan Gomez

3 Answers

Shai

izady

Mohammad Rasoul tanhatalab

Recent Activity

Donate For Us

Keras Neural Nets, How to remove NaN values in output? [duplicate]

Tags:

machine-learning

neural-network

deep-learning

gradient-descent

caffe

Aidan Gomez

3 Answers

Shai

izady

Mohammad Rasoul tanhatalab

Related questions

Recent Activity

Donate For Us