Loss suddenly increases with Adam Optimizer in Tensorflow

Tags:

I am using a CNN for a regression task. I use Tensorflow and the optimizer is Adam. The network seems to converge perfectly fine till one point where the loss suddenly increases along with the validation error. Here are the loss plots of the labels and the weights separated (Optimizer is run on the sum of them) label loss weight loss

I use l2 loss for weight regularization and also for the labels. I apply some randomness on the training data. I am currently trying RSMProp to see if the behavior changes but it takes at least 8h to reproduce the error.

I would like to understand how this can happen. Hope you can help me.

990

asked Feb 14 '17 08:02

andre_bauer

1 Answers

My experience over the last months is the following: Adam is very easy to use because you don't have to play with initial learning rate very much and it almost always works. However, when coming to convergence Adam does not really sattle with a solution but jiggles around at higher iterations. While SGD gives an almost perfectly shaped loss plot and seems to converge much better in higher iterations. But changing litte parts of the setup requires to adjust the SGD parameters or you will end up with NaNs... For experiments on architectures and general approaches I favor Adam, but if you want to get the best version of one chosen architecture you should use SGD and at least compare the solutions.

I also noticed that a good initial SGD setup (learning rate, weight decay etc.) converges as fast as using Adam, at leas for my setup. Hope this may help some of you!

EDIT: Please note that the effects in my initial question are NOT normal even with Adam. Seems like I had a bug but I can't really remember the issue there.

179

answered Sep 22 '22 03:09

andre_bauer

Related questions
                            
                                Storing tensorflow models in memory
                            
                                How to implement Tensorflow batch normalization in LSTM
                            
                                Float16 slower than float32 in keras
                            
                                Keras model.fit() with tf.dataset API + validation_data
                            
                                Quantize a Keras neural network model
                            
                                Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)
                            
                                Why doesn't my Deep Q Network master a simple Gridworld (Tensorflow)? (How to evaluate a Deep-Q-Net)
                            
                                TensorFlow Estimator ServingInputReceiver features vs receiver_tensors: when and why?
                            
                                How to use Batch Normalization correctly in tensorflow?
                            
                                How to deal with UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape
                            
                                Python TensorFlow: How to restart training with optimizer and import_meta_graph?
                            
                                When to use tensorflow datasets api versus pandas or numpy
                            
                                Keras inconsistent prediction time
                            
                                Restoring TensorFlow model
                            
                                How can I use Tensorflow with react-native? [closed]
                            
                                How to select batch size automatically to fit GPU?
                            
                                what is Device interconnect StreamExecutor with strength 1 edge matrix
                            
                                TensorFlow Object Detection API print objects found on image to console
                            
                                Does bias in the convolutional layer really make a difference to the test accuracy?
                            
                                Understanding COCO evaluation "maximum detections"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loss suddenly increases with Adam Optimizer in Tensorflow

Tags:

neural-network

tensorflow

conv-neural-network

regression

andre_bauer

People also ask

1 Answers

andre_bauer

Recent Activity

Donate For Us