Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why disable dropout during validation and testing?

Tags:

I've seen in multiple places that you should disable dropout during validation and testing stages and only keep it during the training phase. Is there a reason why that should happen? I haven't been able to find a good reason for that and was just wondering.

One reason I'm asking is because I trained a model with dropout, and the results turned out well - about 80% accuracy. Then, I went on to validate the model but forgot to set the prob to 1 and the model's accuracy went down to about 70%. Is it supposed to be that drastic? And is it as simple as setting the prob to 1 in each dropout layer?

Thanks in advance!

like image 721
dooder Avatar asked May 28 '17 03:05

dooder


People also ask

How does dropout affect accuracy?

With dropout (dropout rate less than some small value), the accuracy will gradually increase and loss will gradually decrease first(That is what is happening in your case). When you increase dropout beyond a certain threshold, it results in the model not being able to fit properly.

Is dropout used during testing?

Dropout is only used during training to make the network more robust to fluctuations in the training data. At test time, however, you want to use the full network in all its glory. In other words, you do not apply dropout with the test data and during inference in production.

Is dropout applied on validation set?

Q: Are dropout layers applied to validation data in Keras? A: No.

How does dropout work during testing?

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass.


2 Answers

Dropout is a random process of disabling neurons in a layer with chance p. This will make certain neurons feel they are 'wrong' in each iteration - basically, you are making neurons feel 'wrong' about their output so that they rely less on the outputs of the nodes in the previous layer. This is a method of regularization and reduces overfitting.

However, there are two main reasons you should not use dropout to test data:

  • Dropout makes neurons output 'wrong' values on purpose
  • Because you disable neurons randomly, your network will have different outputs every (sequences of) activation. This undermines consistency.

However, you might want to read some more on what validation/testing exactly is:

Training set: a set of examples used for learning: to fit the parameters of the classifier In the MLP case, we would use the training set to find the “optimal” weights with the back-prop rule

Validation set: a set of examples used to tune the parameters of a classifier In the MLP case, we would use the validation set to find the “optimal” number of hidden units or determine a stopping point for the back-propagation algorithm

Test set: a set of examples used only to assess the performance of a fully-trained classifier In the MLP case, we would use the test to estimate the error rate after we have chosen the final model (MLP size and actual weights) After assessing the final model on the test set, YOU MUST NOT tune the model any further!

Why separate test and validation sets? The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model After assessing the final model on the test set, YOU MUST NOT tune the model any further!

source : Introduction to Pattern Analysis,Ricardo Gutierrez-OsunaTexas A&M University, Texas A&M University (answer)

So even for validation, how would you determine which nodes you remove if the nodes have a random probability of being disactivated?

like image 62
Thomas Wagenaar Avatar answered Sep 19 '22 04:09

Thomas Wagenaar


Dropout is a method of making bagging practical for ensembles of very many large neural networks.

enter image description here

Along the same line we may remember that using the following false explanation: For the new data, we can predict their classes by taking the average of the results from all the learners:

enter image description here

Since N is a constant we can just ignore it and the result remains the same, so we should disable dropout during validation and testing.


The true reason is much more complex. It is because of the weight scaling inference rule:

We can approximate p_{ensemble} by evaluating p(y|x) in one model: the model with all units, but with the weights going out of unit i multiplied by the probability of including unit i. The motivation for this modification is to capture the right expected value of the output from that unit. There is not yet any theoretical argument for the accuracy of this approximate inference rule in deep nonlinear networks, but empirically it performs very well.

When we train the model using dropout(for example for one layer) we zero out some outputs of some neurons and scale the others up by 1/keep_prob to keep the expectation of the layer almost the same as before. In the prediction process, we can use dropout but we can only get different predictions each time because we drop the values out randomly, then we need to run the prediction many times to get the expected output. Such a process is time-consuming so we can remove the dropout and the expectation of the layer remains the same.

Reference:

  1. Difference between Bagging and Boosting?
  2. 7.12 of Deep Learning
like image 44
Lerner Zhang Avatar answered Sep 21 '22 04:09

Lerner Zhang