Cross entropy loss suddenly increases to infinity

Tags:

I am attempting to replicate an deep convolution neural network from a research paper. I have implemented the architecture, but after 10 epochs, my cross entropy loss suddenly increases to infinity. This can be seen in the chart below. You can ignore what happens to the accuracy after the problem occurs.

Here is the github repository with a picture of the architecture

After doing some research I think using an AdamOptimizer or relu might be a problem.

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#Many Convolutions and Relus omitted

final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

EDIT If anyone is interested, the solution was that I was basically feeding in incorrect data.

656

asked Feb 03 '18 18:02

Devin Haslam

2 Answers

Solution: Control the solution space. This might mean using smaller datasets when training, it might mean using less hidden nodes, it might mean initializing your wb differently. Your model is reaching a point where the loss is undefined, which might be due to the gradient being undefined, or the final_conv signal.

Why: Sometimes no matter what, a numerical instability is reached. Eventually adding a machine epsilon to prevent dividing by zero (cross entropy loss here) just won't help because even then the number cannot be accurately represented by the precision you are using. (Ref: https://en.wikipedia.org/wiki/Round-off_error and https://floating-point-gui.de/basic/)

Considerations:
1) When tweaking epsilons, be sure to be consistent with your data type (Use the machine epsilon of the precision you are using, in your case float32 is 1e-6 ref: https://en.wikipedia.org/wiki/Machine_epsilon and python numpy machine epsilon.

2) Just in-case others reading this are confused: The value in the constructor for Adamoptimizer is the learning rate, but you can set the epsilon value (ref: How does paramater epsilon affects AdamOptimizer? and https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer)

3) Numerical instability of tensorflow is there and its difficult to get around. Yes there is tf.nn.softmax_with_cross_entropy but this is too specific (what if you don't want a softmax?). Refer to Vahid Kazemi's 'Effective Tensorflow' for an insightful explanation: https://github.com/vahidk/EffectiveTensorflow#entropy

answered Nov 06 '22 21:11

Phil P

that jump in your loss graph is very weird...

I would like you to focus on few points :

if your images are not normalized between 0 and 1 then normalize them
if you have normalized your values between -1 and 1 then use a sigmoid layer instead of softmax because softmax squashes the values between 0 and 1
before using softmax add a sigmoid layer to squash your values (Highly Recommended)
other things you can do is add dropouts for every layer
also I would suggest you to use tf.clip so that your gradients does not explode and implode
you can also use L2 regularization
and experiment with the learning rate and epsilon of AdamOptimizer
I would also suggest you to use tensor-board to keep track of the weights so that way you will come to know where the weights are exploding
You can also use tensor-board for keeping track of loss and accuracy
See The softmax formula below:

enter image description here

Probably that e to power of x, the x is being a very large number because of which softmax is giving infinity and hence the loss is infinity
Heavily use tensorboard to debug and print the values of the softmax so that you can figure out where you are going wrong
One more thing I noticed you are not using any kind of activation functions after the convolution layers... I would suggest you to leaky relu after every convolution layer
Your network is a humongous network and it is important to use leaky relu as activation function so that it adds non-linearity and hence improves the performance

answered Nov 06 '22 19:11

Jai

Related questions
                            
                                PyQt5 - How to add a scrollbar to a QMessageBox
                            
                                Pandas Groupby Unique Multiple Columns
                            
                                Sending Godaddy email via Django using python
                            
                                Plotly deactivate x axis sorting
                            
                                Is it possible to transform an 1D tensor to a list ? (Tensorflow)
                            
                                Can python class variables become instance variables when altered in __init__?
                            
                                Keras crossentropy
                            
                                Suddenly I can't load some newly upgraded modules in Python
                            
                                Cythonizing fails because of unknown type name 'uint64_t'
                            
                                Do I need to commit the dist folder to git in order to be able to install the package using pip from git?
                            
                                Apply CountVectorizer to column with list of words in rows in Python
                            
                                How to remove minimize/maximize buttons while preserving the icon?
                            
                                Save Pandas df containing long list as csv file
                            
                                Alias for a chain of commands
                            
                                Python3 - using for loop in a if condition
                            
                                Python how to make a curl request with username/password
                            
                                Accessing incoming POST data in Flask
                            
                                `re.sub()` in pandas
                            
                                Getting the amount of available lines in a terminal
                            
                                Python Pandas DatetimeIndex.hour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cross entropy loss suddenly increases to infinity

Tags:

python

machine-learning

tensorflow

deep-learning

conv-neural-network

Devin Haslam

People also ask

2 Answers

Phil P

Jai

Recent Activity

Donate For Us