I have this multilayer network with ReLU hidden layer activations and Sigmoid output layer activations. I want to implement dropout (where each neuron has a chance to just output zero during training).
I was thinking I could just introduce this noise as part of the ReLU activation routine during training and be done with it, but I wasn't sure if, in principle, dropout extends to the visible/output layer or not.
(In my mind, dropout eliminates over-fitting because it effectively makes the network an average of many smaller networks. I'm just not sure about the output layer)
Yes, you are right - you should not apply dropout to output layer. Intuitively - introduction of such noise makes the output of your network pretty likely independent of the structure of your network. No matter what kind of computations were made in hidden layers - with some probability output might be independent of them. This is exactly opposite to the philosophy of a modeling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With