<ol> <li>I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)</li> </ol> At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy. When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me. <ol start="2"> <li>Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)</li> </ol> I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients. <ol> <li>Is my model overfitting?</li> <li>If so, how can I reduce the overfitting besides data augmentation?</li> <li>If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?</li> </ol>

The training loss at each epoch is usually computed on the entire training set. The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey. Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations. <hr> It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.

Validation loss oscillates a lot, validation accuracy > learning accuracy, but test accuracy is high. Is my model overfitting?

Tags:

optimization

deep-learning

conv-neural-network

I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)

At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.

When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.

Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)

I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.

Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?

334

asked Mar 26 '19 00:03

dusa

1 Answers

The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.

Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.

It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.

196

answered Sep 28 '22 12:09

Soroush

Related questions
                            
                                What are the ways to load JavaScript or CSS without executing them?
                            
                                Optimization techniques used by std::regex_constants::optimize
                            
                                Combining source code into a single file for optimization
                            
                                Optimizing product assembly / disassembly
                            
                                Persistent in-memory Python object for nginx/uwsgi server
                            
                                Reduce memory consumed by method that uses OpenCv on iOS
                            
                                Hoisting/Reordering in C, C++ and Java: Must variable declarations always be on top in a context?
                            
                                Permuting bytes inside SSE __m128i register
                            
                                Minimum difference between sum of two numbers in an array
                            
                                Loop unrolling in inlined functions in C
                            
                                Using valgrind to measure cache misses [closed]
                            
                                Julia: optimize simulation of simple dynamical system
                            
                                Improving code design of DNA alignment degapping
                            
                                Optimization for Date Correlation doesn’t change plan
                            
                                Why aren't those function calls optimized?
                            
                                std::vector<bool> optimization implementation
                            
                                Odd results when evaluating benchmark example from Rust Book
                            
                                How to set an app icon badge number in iOS 10?
                            
                                Why was & used over && in java when comparing two bools?
                            
                                Handling nested Collections with Java 8 streams

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With