I am attempting to replicate an deep convolution neural network from a research paper. I have implemented the architecture, but after 10 epochs, my cross entropy loss suddenly increases to infinity. This can be seen in the chart below. You can ignore what happens to the accuracy after the problem occurs.
Here is the github repository with a picture of the architecture
After doing some research I think using an AdamOptimizer or relu might be a problem.
x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])
#Many Convolutions and Relus omitted
final = tf.reshape(final, [-1, 7168])
keep_prob = tf.placeholder(tf.float32)
W_final = weight_variable([7168,7168,3])
b_final = bias_variable([7168,3])
final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=final_conv))
train_step = tf.train.AdamOptimizer(1e-5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(final_conv, 2), tf.argmax(y_, 2))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
EDIT If anyone is interested, the solution was that I was basically feeding in incorrect data.
Cross Entropy loss: if the right class is predicted as 1, then the loss would be 0; if the right class if predicted as 0 ( totally wrong ), then the loss would be infinity.
One possible reason could be the numerical instability of some weights or gradients. For example, some weights or gradients might become too small, so when you do the calculations with them, it gives incorrect ("exploding") results. Same could happen if they become too large. Save this answer.
Cross entropy loss is a metric used to measure how well a classification model in machine learning performs. The loss (or error) is measured as a number between 0 and 1, with 0 being a perfect model. The goal is generally to get your model as close to 0 as possible.
The use of negative logs on probabilities is what is known as the cross-entropy, where a high number means bad models and a low number means a good model. When we calculate the log for each data point, we actually get the error function for each point.
Solution: Control the solution space. This might mean using smaller datasets when training, it might mean using less hidden nodes, it might mean initializing your wb differently. Your model is reaching a point where the loss is undefined, which might be due to the gradient being undefined, or the final_conv signal.
Why: Sometimes no matter what, a numerical instability is reached. Eventually adding a machine epsilon to prevent dividing by zero (cross entropy loss here) just won't help because even then the number cannot be accurately represented by the precision you are using. (Ref: https://en.wikipedia.org/wiki/Round-off_error and https://floating-point-gui.de/basic/)
Considerations:
1) When tweaking epsilons, be sure to be consistent with your data type (Use the machine epsilon of the precision you are using, in your case float32 is 1e-6 ref: https://en.wikipedia.org/wiki/Machine_epsilon and python numpy machine epsilon.
2) Just in-case others reading this are confused: The value in the constructor for Adamoptimizer is the learning rate, but you can set the epsilon value (ref: How does paramater epsilon affects AdamOptimizer? and https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer)
3) Numerical instability of tensorflow is there and its difficult to get around. Yes there is tf.nn.softmax_with_cross_entropy but this is too specific (what if you don't want a softmax?). Refer to Vahid Kazemi's 'Effective Tensorflow' for an insightful explanation: https://github.com/vahidk/EffectiveTensorflow#entropy
that jump in your loss graph is very weird...
I would like you to focus on few points :
tf.clip
so that your gradients does not explode and implodeYou can also use tensor-board for keeping track of loss and accuracy
See The softmax formula below:
leaky relu
after every convolution layerleaky relu
as activation function so that it adds non-linearity and hence improves the performance If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With