NaN results in tensorflow Neural Network

Tags:

I have this problem that after one iteration nearly all my parameters (cost function, weights, hypothesis function, etc.) output 'NaN'. My code is similar to the tensorflow tutorial MNIST-Expert (https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html). I looked for solutions already and so far I tried: reducing the learning rate to nearly zero and setting it to zero, using AdamOptimizer instead of gradient descent, using sigmoid function for the hypothesis function in the last layer and using only numpy functions. I have some negative and zero values in my input data, so I can't use the logarithmic cross entropy instead of the quadratic cost function. The result is the same, butMy input data consist of stresses and strains of soils.

import tensorflow as tf
import Datafiles3_pv_complete as soil
import numpy as np

m_training = int(18.0)
m_cv = int(5.0)
m_test = int(5.0)
total_examples = 28

" range for running "
range_training = xrange(0,m_training)
range_cv = xrange(m_training,(m_training+m_cv))
range_test = xrange((m_training+m_cv),total_examples)

""" Using interactive Sessions"""
sess = tf.InteractiveSession()

""" creating input and output vectors """
x = tf.placeholder(tf.float32, shape=[None, 11])
y_true = tf.placeholder(tf.float32, shape=[None, 3])

""" Standard Deviation Calculation"""
stdev = np.divide(2.0,np.sqrt(np.prod(x.get_shape().as_list()[1:])))

""" Weights and Biases """

def weights(shape):
    initial = tf.truncated_normal(shape, stddev=stdev)
    return tf.Variable(initial)

def bias(shape):
    initial = tf.truncated_normal(shape, stddev=1.0)
    return tf.Variable(initial)

""" Creating weights and biases for all layers """
theta1 = weights([11,7])
bias1 = bias([1,7])

theta2 = weights([7,7])
bias2 = bias([1,7])

"Last layer"
theta3 = weights([7,3])
bias3 = bias([1,3])


""" Hidden layer input (Sum of weights, activation functions and bias)
z = theta^T * activation + bias
"""
def Z_Layer(activation,theta,bias):
    return tf.add(tf.matmul(activation,theta),bias)

""" Creating the sigmoid function 
sigmoid = 1 / (1 + exp(-z))
"""
def Sigmoid(z):
    return tf.div(tf.constant(1.0),tf.add(tf.constant(1.0), tf.exp(tf.neg(z))))

""" hypothesis functions - predicted output """    
' layer 1 - input layer '
hyp1 = x
' layer 2 '
z2 = Z_Layer(hyp1, theta1, bias1)
hyp2 = Sigmoid(z2)
' layer 3 '
z3 = Z_Layer(hyp2, theta2, bias2)
hyp3 = Sigmoid(z3)
' layer 4 - output layer '
zL = Z_Layer(hyp3, theta3, bias3)
hypL = tf.add( tf.add(tf.pow(zL,3), tf.pow(zL,2) ), zL)


""" Cost function """
cost_function = tf.mul( tf.div(0.5, m_training), tf.pow( tf.sub(hypL, y_true), 2)) 

#cross_entropy = -tf.reduce_sum(y_true*tf.log(hypL) + (1-y_true)*tf.log(1-hypL))

""" Gradient Descent """
train_step = tf.train.GradientDescentOptimizer(learning_rate=0.003).minimize(cost_function)       

"""    Training and Evaluation     """

correct_prediction = tf.equal(tf.arg_max(hypL, 1), tf.arg_max(y_true, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.initialize_all_variables())

keep_prob = tf.placeholder(tf.float32)

""" Testing - Initialise lists  """
hyp1_test = []
z2_test = []
hyp2_test = []
z3_test = []
hyp3_test = []
zL_test = []
hypL_test = []
cost_function_test =[]
complete_error_test = []
theta1_test = []
theta2_test = []
theta3_test = []
bias1_test = []
bias2_test = []
bias3_test = []
""" -------------------------   """

complete_error_init = tf.abs(tf.reduce_mean(tf.sub(hypL,y_true),1))

training_error=[]
for j in range_training:
    feedj = {x: soil.input_scale[j], y_true: soil.output_scale[j] , keep_prob: 1.0}

    """ -------------------------   """
    'Testing - adding to list'
    z2_init = z2.eval(feed_dict=feedj)
    z2_test.append(z2_init)

    hyp2_init = hyp2.eval(feed_dict=feedj)
    hyp2_test.append(hyp2_init)

    z3_init = z3.eval(feed_dict=feedj)
    z3_test.append(z3_init)

    hyp3_init = hyp3.eval(feed_dict=feedj)
    hyp3_test.append(hyp3_init)

    zL_init = zL.eval(feed_dict=feedj)
    zL_test.append(zL_init)

    hypL_init = hypL.eval(feed_dict=feedj)
    hypL_test.append(hypL_init)

    cost_function_init = cost_function.eval(feed_dict=feedj)
    cost_function_test.append(cost_function_init)

    complete_error = complete_error_init.eval(feed_dict=feedj)
    complete_error_test.append(complete_error)
    print 'number iterations: %g, error (S1, S2, S3): %g, %g, %g' % (j, complete_error[0], complete_error[1], complete_error[2])

    theta1_init = theta1.eval()
    theta1_test.append(theta1_init)

    theta2_init = theta2.eval()
    theta2_test.append(theta2_init)

    theta3_init = theta3.eval()
    theta3_test.append(theta3_init)

    bias1_init = bias1.eval()
    bias1_test.append(bias1_init)

    bias2_init = bias2.eval()
    bias2_test.append(bias2_init)

    bias3_init = bias3.eval()
    bias3_test.append(bias3_init)
    """ -------------------------   """

    train_accuracy = accuracy.eval(feed_dict=feedj)
    print("step %d, training accuracy %g" % (j, train_accuracy))
    train_step.run(feed_dict=feedj)
    training_error.append(1 - train_accuracy)

cv_error=[]    
for k in range_cv:
feedk = {x: soil.input_scale[k], y_true: soil.output_scale[k] , keep_prob: 1.0}
    cv_accuracy = accuracy.eval(feed_dict=feedk)
    print("cross-validation accuracy %g" % cv_accuracy)
    cv_error.append(1-cv_accuracy) 

for l in range_test:
    print("test accuracy %g" % accuracy.eval(feed_dict={x: soil.input_matrixs[l], y_true: soil.output_matrixs[l], keep_prob: 1.0}))

The last weeks I was working on a Unit-model for this problem, but the same output occurred. I have no idea what to try next. Hope someone can help me.

Edit:

I checked some parameters in detail again. The hypothesis function (hyp) and activation function (z) for layer 3 and 4 (last layer) have the same entries for each data point, i.e. the same value in each line for one column.

783

asked Sep 27 '16 00:09

DeniseLotti

2 Answers

1e^-3 is still fairly high, for the classifier you've described. NaN actually means that the weights have tended to infinity, so I would suggest exploring even lower learning rates, around 1e^-7 specifically. If it continues to diverge, multiply your learning rate by 0.1, and repeat until the weights are finite-valued.

114

answered Nov 14 '22 23:11

Alvin Wan

Finally, no more NaN values. The solution is to scale my input and output data. The result (accuracy) is still not good, but at least I get some real values for the parameters. I tried feature scaling before in other attempts (where I probably had some other mistakes as well) and assumed it wouldn't help with my problem either.

answered Nov 14 '22 23:11

DeniseLotti

Related questions
                            
                                Python: get every possible combination of weights for a portfolio
                            
                                Python pygame error : Failed loading libpng.dylib: dlopen(libpng.dylib, 2): image not found
                            
                                Preventing multiple matches in list iteration
                            
                                Sentry django configuration - logger
                            
                                How can I execute arbitrary code via JSON and how to sanitize the input
                            
                                Adding Downloaded Fonts To Tkinter
                            
                                Method for calculating irregularly spaced accumulation points
                            
                                Writing python log files to logstash
                            
                                Tensorflow serving retrained inception
                            
                                How to properly use By class in selenium with python
                            
                                Programmatically getting list of child processes of a given PID
                            
                                How to run a migration with Python Alembic by code?
                            
                                "virtualenv is not compatible with this system or executable" using virtual env and anaconda
                            
                                Dask dataframe: Memory error with merge
                            
                                Pandas Dataframe indexing: KeyError:none of [columns] are in the [columns]
                            
                                Keras dependencies needed for prediction only in AWS Lambda
                            
                                Pandas series.rename gives TypeError: 'str' object is not callable error
                            
                                Using a DLL exported from D
                            
                                qt5 (and python) - How to get the default thumbnail size of user's os
                            
                                Clean separation between application thread and Qt thread (Python - PyQt)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NaN results in tensorflow Neural Network

Tags:

python

nan

neural-network

tensorflow

Edit:

DeniseLotti

People also ask

2 Answers

Alvin Wan

DeniseLotti

Recent Activity

Donate For Us