I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49.
However, the same model gives an accuracy of 0.05 on the testing data.
I am using neural networks as my model
So, what are the possible reasons for this to happen and how does one begin to check and correct these issues?
Is validation accuracy and testing accuracy same?
In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".
Why test accuracy is less than validation accuracy?
If your model's accuracy on your testing data is lower than your training or validation accuracy, it usually indicates that there are meaningful differences between the kind of data you trained the model on and the testing data you're providing for evaluation.
Is validation and testing same?
What is this? One point of confusion for students is the difference between the validation set and the test set. In simple terms, the validation set is used to optimize the model parameters while the test set is used to provide an unbiased estimate of the final model.
Why validation accuracy is higher than testing accuracy?
However when evaluating validation accuracy and test accuracy drop out is NOT active so the model is actually more accurate. This increase in accuracy might be enough to overcome the decrease due to over fitting. Especially possible in this case since the accuracy differences appear to be quite small.
Reasons for a high generalization gap:
-
Different distributions: The validation and test set might come from different distributions. Try to verify that they are indeed sampled from the same process in your code.
-
Number of samples: The size of the validation and / or the test set is too low. This means that the empirical data distributions differ too much, explaining the different reported accuracies. One example would be a dataset consisting of thousands of images, but also thousands of classes. Then, the test set might contain some classes that are not in the validation set (and vice versa). Use cross-validation to check, if the test accuracy is always lower than the validation accuracy, or if they just generally differ a lot in each fold.
-
Hyperparameter Overfitting: This is also related to the size of the two sets. Did you do hyperparameter tuning? If so, you can check if the accuracy gap existed before you tuned the hyperparameters, as you might have "overfitted" the hyperparameters on the validation set.
-
Loss function vs. accuracy: you reported different accuracies. Did you also check the train, validation and test losses? You train your model on the loss function, so this is the most direct performance measure. If the accuracy is only loosely coupled to your loss function and the test loss is approximately as low as the validation loss, it might explain the accuracy gap.
-
Bug in the code: if the test and validation set are sampled from the same process and are sufficiently large, they are interchangeable. This means that the test and validation losses must be approximately equal. So, if you checked the four points above, my next best guess would be a bug in the code. For example, you accidentally trained your model on the validation set as well. You might want to train your model on a larger dataset and then check, if the accuracies still diverge.