Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NaN results in tensorflow Neural Network

I have this problem that after one iteration nearly all my parameters (cost function, weights, hypothesis function, etc.) output 'NaN'. My code is similar to the tensorflow tutorial MNIST-Expert (https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html). I looked for solutions already and so far I tried: reducing the learning rate to nearly zero and setting it to zero, using AdamOptimizer instead of gradient descent, using sigmoid function for the hypothesis function in the last layer and using only numpy functions. I have some negative and zero values in my input data, so I can't use the logarithmic cross entropy instead of the quadratic cost function. The result is the same, butMy input data consist of stresses and strains of soils.

import tensorflow as tf
import Datafiles3_pv_complete as soil
import numpy as np

m_training = int(18.0)
m_cv = int(5.0)
m_test = int(5.0)
total_examples = 28

" range for running "
range_training = xrange(0,m_training)
range_cv = xrange(m_training,(m_training+m_cv))
range_test = xrange((m_training+m_cv),total_examples)

""" Using interactive Sessions"""
sess = tf.InteractiveSession()

""" creating input and output vectors """
x = tf.placeholder(tf.float32, shape=[None, 11])
y_true = tf.placeholder(tf.float32, shape=[None, 3])

""" Standard Deviation Calculation"""
stdev = np.divide(2.0,np.sqrt(np.prod(x.get_shape().as_list()[1:])))

""" Weights and Biases """

def weights(shape):
    initial = tf.truncated_normal(shape, stddev=stdev)
    return tf.Variable(initial)

def bias(shape):
    initial = tf.truncated_normal(shape, stddev=1.0)
    return tf.Variable(initial)

""" Creating weights and biases for all layers """
theta1 = weights([11,7])
bias1 = bias([1,7])

theta2 = weights([7,7])
bias2 = bias([1,7])

"Last layer"
theta3 = weights([7,3])
bias3 = bias([1,3])


""" Hidden layer input (Sum of weights, activation functions and bias)
z = theta^T * activation + bias
"""
def Z_Layer(activation,theta,bias):
    return tf.add(tf.matmul(activation,theta),bias)

""" Creating the sigmoid function 
sigmoid = 1 / (1 + exp(-z))
"""
def Sigmoid(z):
    return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0), tf.exp(tf.neg(z))))

""" hypothesis functions - predicted output """    
' layer 1 - input layer '
hyp1 = x
' layer 2 '
z2 = Z_Layer(hyp1, theta1, bias1)
hyp2 = Sigmoid(z2)
' layer 3 '
z3 = Z_Layer(hyp2, theta2, bias2)
hyp3 = Sigmoid(z3)
' layer 4 - output layer '
zL = Z_Layer(hyp3, theta3, bias3)
hypL = tf.add( tf.add(tf.pow(zL,3), tf.pow(zL,2) ), zL)


""" Cost function """
cost_function = tf.mul( tf.div(0.5, m_training), tf.pow( tf.sub(hypL, y_true), 2)) 

#cross_entropy = -tf.reduce_sum(y_true*tf.log(hypL) + (1-y_true)*tf.log(1-hypL))

""" Gradient Descent """
train_step = tf.train.GradientDescentOptimizer(learning_rate=0.003).minimize(cost_function)       

"""    Training and Evaluation     """

correct_prediction = tf.equal(tf.arg_max(hypL, 1), tf.arg_max(y_true, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.initialize_all_variables())

keep_prob = tf.placeholder(tf.float32)

""" Testing - Initialise lists  """
hyp1_test = []
z2_test = []
hyp2_test = []
z3_test = []
hyp3_test = []
zL_test = []
hypL_test = []
cost_function_test =[]
complete_error_test = []
theta1_test = []
theta2_test = []
theta3_test = []
bias1_test = []
bias2_test = []
bias3_test = []
""" -------------------------   """

complete_error_init = tf.abs(tf.reduce_mean(tf.sub(hypL,y_true),1))

training_error=[]
for j in range_training:
    feedj = {x: soil.input_scale[j], y_true: soil.output_scale[j] , keep_prob: 1.0}

    """ -------------------------   """
    'Testing - adding to list'
    z2_init = z2.eval(feed_dict=feedj)
    z2_test.append(z2_init)

    hyp2_init = hyp2.eval(feed_dict=feedj)
    hyp2_test.append(hyp2_init)

    z3_init = z3.eval(feed_dict=feedj)
    z3_test.append(z3_init)

    hyp3_init = hyp3.eval(feed_dict=feedj)
    hyp3_test.append(hyp3_init)

    zL_init = zL.eval(feed_dict=feedj)
    zL_test.append(zL_init)

    hypL_init = hypL.eval(feed_dict=feedj)
    hypL_test.append(hypL_init)

    cost_function_init = cost_function.eval(feed_dict=feedj)
    cost_function_test.append(cost_function_init)

    complete_error = complete_error_init.eval(feed_dict=feedj)
    complete_error_test.append(complete_error)
    print 'number iterations: %g, error (S1, S2, S3): %g, %g, %g' % (j, complete_error[0], complete_error[1], complete_error[2])

    theta1_init = theta1.eval()
    theta1_test.append(theta1_init)

    theta2_init = theta2.eval()
    theta2_test.append(theta2_init)

    theta3_init = theta3.eval()
    theta3_test.append(theta3_init)

    bias1_init = bias1.eval()
    bias1_test.append(bias1_init)

    bias2_init = bias2.eval()
    bias2_test.append(bias2_init)

    bias3_init = bias3.eval()
    bias3_test.append(bias3_init)
    """ -------------------------   """

    train_accuracy = accuracy.eval(feed_dict=feedj)
    print("step %d, training accuracy %g" % (j, train_accuracy))
    train_step.run(feed_dict=feedj)
    training_error.append(1 - train_accuracy)

cv_error=[]    
for k in range_cv:
feedk = {x: soil.input_scale[k], y_true: soil.output_scale[k] , keep_prob: 1.0}
    cv_accuracy = accuracy.eval(feed_dict=feedk)
    print("cross-validation accuracy %g" % cv_accuracy)
    cv_error.append(1-cv_accuracy) 

for l in range_test:
    print("test accuracy %g" % accuracy.eval(feed_dict={x: soil.input_matrixs[l], y_true: soil.output_matrixs[l], keep_prob: 1.0}))

The last weeks I was working on a Unit-model for this problem, but the same output occurred. I have no idea what to try next. Hope someone can help me.

Edit:

I checked some parameters in detail again. The hypothesis function (hyp) and activation function (z) for layer 3 and 4 (last layer) have the same entries for each data point, i.e. the same value in each line for one column.

like image 783
DeniseLotti Avatar asked Sep 27 '16 00:09

DeniseLotti


People also ask

What causes NaN in TensorFlow?

The reason for nan , inf or -inf often comes from the fact that division by 0.0 in TensorFlow doesn't result in a division by zero exception. It could result in a nan , inf or -inf "value". In your training data you might have 0.0 and thus in your loss function it could happen that you perform a division by 0.0 .

What does NaN mean in TensorFlow?

Show activity on this post. For TensorFlow 2, inject some x=tf. debugging. check_numerics(x,'x is nan') into your code. They will throw an InvalidArgument error if x has any values that are not a number (NaN) or infinity (Inf).

What is NaN in deep learning?

What are NaN values? NaN or Not a Number are special values in DataFrame and numpy arrays that represent the missing of value in a cell. In programming languages they are also represented, for example in Python they are represented as None value.


2 Answers

1e^-3 is still fairly high, for the classifier you've described. NaN actually means that the weights have tended to infinity, so I would suggest exploring even lower learning rates, around 1e^-7 specifically. If it continues to diverge, multiply your learning rate by 0.1, and repeat until the weights are finite-valued.

like image 114
Alvin Wan Avatar answered Nov 14 '22 23:11

Alvin Wan


Finally, no more NaN values. The solution is to scale my input and output data. The result (accuracy) is still not good, but at least I get some real values for the parameters. I tried feature scaling before in other attempts (where I probably had some other mistakes as well) and assumed it wouldn't help with my problem either.

like image 35
DeniseLotti Avatar answered Nov 14 '22 23:11

DeniseLotti