I am trying to understand the effect of dropout on validation Mean Absolute Error (non-linear regression problem).
Without dropout
With dropout of 0.05
With dropout of 0.075
Without any dropouts the validation loss is more than training loss as shown in 1. My understanding is that the validation loss should only be slightly more than the training loss for a good fit.
Carefully, I increased the dropout so that validation loss is close to the training loss as seen in 2. The dropout is only applied during training and not during validation, hence the validation loss is lower than the training loss.
Finally the dropout was increased further and the validation loss again became more than the training loss in 3.
Which amongst these three should be called as a good fit?
Following the response of Marcin Możejko, I predicted against three tests as shown in 4. The 'Y' axis shows RMS error instead of MAE. The model 'without dropout' gave the best result.
Dropout penalizes model variance by randomly freezing neurons in a layer during model training. Like L1 and L2 regularization, dropout is only applicable during the training process and affects training loss, leading to cases where validation loss is lower than training loss. Which is expected.
With dropout (dropout rate less than some small value), the accuracy will gradually increase and loss will gradually decrease first(That is what is happening in your case). When you increase dropout beyond a certain threshold, it results in the model not being able to fit properly.
Q: Are dropout layers applied to validation data in Keras? A: No.
Don't use dropout after the training phase as you do not want to ignore inputs and signals when the network is in use. Moreover, using dropout in the last layer is not ideal. Convolutional layers should not use the 1-D version of dropout.
Well - this a really good question. In my opinion - the lowest validation score (confirmed on a separate test set) is the best fit. Remember that in the end - the performance of your model on a totally new data is the most crucial thing and the fact that it performed even better on a training set is not so important.
Moreover - I think that your model might generaly underfit - and you could try extend it to e.g. have more layers or neurons and prune it a little bit using dropout
in order to prevent example memoization.
If my hypothesis turned out to be false - remember - that it still might be possible that there are certain data patterns present only on validation set (this relatively often in case of medium size datasets) what makes the divergence of train and test loss. Moreover - I think that even though that your losses values saturated in case without dropout there is still a room for improvement by simple increase in number of epochs as there seems to be a trend for losses to be smaller.
Another technique I recommend you to try is reducing learning rate on plateau (using example this callback) as your model seems to need refinement with lower value learning rate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With