Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What layers should experience "dropout" when training a Neural Network?

I have this multilayer network with ReLU hidden layer activations and Sigmoid output layer activations. I want to implement dropout (where each neuron has a chance to just output zero during training).

I was thinking I could just introduce this noise as part of the ReLU activation routine during training and be done with it, but I wasn't sure if, in principle, dropout extends to the visible/output layer or not.


(In my mind, dropout eliminates over-fitting because it effectively makes the network an average of many smaller networks. I'm just not sure about the output layer)

like image 821
Andrew Weir Avatar asked Jun 30 '16 14:06

Andrew Weir


1 Answers

Yes, you are right - you should not apply dropout to output layer. Intuitively - introduction of such noise makes the output of your network pretty likely independent of the structure of your network. No matter what kind of computations were made in hidden layers - with some probability output might be independent of them. This is exactly opposite to the philosophy of a modeling.

like image 200
Marcin Możejko Avatar answered Oct 23 '22 04:10

Marcin Możejko