I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49. However, the same model gives an accuracy of 0.05 on the testing data. I am using neural networks as my model So, what are the possible reasons for this to happen and how does one begin to check and correct these issues?

Reasons for a high generalization gap: <ol> <li> Different distributions: The validation and test set might come from different distributions. Try to verify that they are indeed sampled from the same process in your code.</li> <li> Number of samples: The size of the validation and / or the test set is too low. This means that the empirical data distributions differ too much, explaining the different reported accuracies. One example would be a dataset consisting of thousands of images, but also thousands of classes. Then, the test set might contain some classes that are not in the validation set (and vice versa). Use cross-validation to check, if the test accuracy is always lower than the validation accuracy, or if they just generally differ a lot in each fold.</li> <li> Hyperparameter Overfitting: This is also related to the size of the two sets. Did you do hyperparameter tuning? If so, you can check if the accuracy gap existed before you tuned the hyperparameters, as you might have "overfitted" the hyperparameters on the validation set.</li> <li> Loss function vs. accuracy: you reported different accuracies. Did you also check the train, validation and test losses? You train your model on the loss function, so this is the most direct performance measure. If the accuracy is only loosely coupled to your loss function and the test loss is approximately as low as the validation loss, it might explain the accuracy gap.</li> <li> Bug in the code: if the test and validation set are sampled from the same process and are sufficiently large, they are interchangeable. This means that the test and validation losses must be approximately equal. So, if you checked the four points above, my next best guess would be a bug in the code. For example, you accidentally trained your model on the validation set as well. You might want to train your model on a larger dataset and then check, if the accuracies still diverge.</li> </ol>

Validation and Testing accuracy widely different

1 Answers

Reasons for a high generalization gap:

Different distributions: The validation and test set might come from different distributions. Try to verify that they are indeed sampled from the same process in your code.
Number of samples: The size of the validation and / or the test set is too low. This means that the empirical data distributions differ too much, explaining the different reported accuracies. One example would be a dataset consisting of thousands of images, but also thousands of classes. Then, the test set might contain some classes that are not in the validation set (and vice versa). Use cross-validation to check, if the test accuracy is always lower than the validation accuracy, or if they just generally differ a lot in each fold.
Hyperparameter Overfitting: This is also related to the size of the two sets. Did you do hyperparameter tuning? If so, you can check if the accuracy gap existed before you tuned the hyperparameters, as you might have "overfitted" the hyperparameters on the validation set.
Loss function vs. accuracy: you reported different accuracies. Did you also check the train, validation and test losses? You train your model on the loss function, so this is the most direct performance measure. If the accuracy is only loosely coupled to your loss function and the test loss is approximately as low as the validation loss, it might explain the accuracy gap.
Bug in the code: if the test and validation set are sampled from the same process and are sufficiently large, they are interchangeable. This means that the test and validation losses must be approximately equal. So, if you checked the four points above, my next best guess would be a bug in the code. For example, you accidentally trained your model on the validation set as well. You might want to train your model on a larger dataset and then check, if the accuracies still diverge.

answered Nov 03 '22 06:11

Kilian Batzner

Related questions
                            
                                How to Get Dependency Parse Output from SyntaxNet
                            
                                Are there any examples of anomaly detection algorithms implemented with TensorFlow?
                            
                                "ValueError: max_features must be in (0, n_features] " in scikit when using random forest
                            
                                How to count objects in Tensorflow Object Detection API
                            
                                What does "splitter" attribute in sklearn's DecisionTreeClassifier do?
                            
                                Does dropout layer go before or after dense layer in TensorFlow?
                            
                                passing supplementary parameters to hyperopt objective function
                            
                                Keras not training on entire dataset
                            
                                How does one use Pytorch (+ cuda) with an A100 GPU?
                            
                                Reconstruct scene like Photosynth
                            
                                The best way to calculate the best threshold with P. Viola, M. Jones Framework
                            
                                Getting a Large List of Nouns (or Adjectives) in Python with NLTK; or Python Mad Libs
                            
                                retrieve misclassified documents using scikitlearn
                            
                                Keras: Expected 3 dimensions, but got array with shape - dense model
                            
                                Weka's PCA is taking too long to run
                            
                                How to gridsearch over transform arguments within a pipeline in scikit-learn
                            
                                LSTM RNN Backpropagation
                            
                                Working of labelEncoder in sklearn
                            
                                How can LSTM attention have variable length input
                            
                                Tensorflow: 'tf.get_default_session()` after sess=tf.Session() is None

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Validation and Testing accuracy widely different

Tags:

machine-learning

deep-learning

training-data

cross-validation

kaggle

user3828311

People also ask

1 Answers

Kilian Batzner

Recent Activity

Donate For Us