All or nearly all of the papers using dropout are using it for supervised learning. It seems that it could just as easily be used to regularize deep autoencoders, RBMs and DBNs. So why isn't dropout used in unsupervised learning?
The reason? Since convolutional layers have few parameters, they need less regularization to begin with. Furthermore, because of the spatial relationships encoded in feature maps, activations can become highly correlated. This renders dropout ineffective.
You should use dropout for overfitting prevention, especially with a small set of training data. Dropout uses randomness in the training process. The weights are optimized for the general problem instead of for noise in the data.
Machine Learning FAQ Typically, dropout is applied after the non-linear activation function (a). However, when using rectified linear units (ReLUs), it might make sense to apply dropout before the non-linear activation (b) for reasons of computational efficiency depending on the particular code implementation.
We can apply a Dropout layer to the input vector, in which case it nullifies some of its features; but we can also apply it to a hidden layer, in which case it nullifies some hidden neurons. Dropout layers are important in training CNNs because they prevent overfitting on the training data.
Dropout is used in unsupervised learning. For example:
Shuangfei Zhai, Zhongfei Zhang: Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs (arxiv, 14 Dec 2015)
Labeled data is relatively scarce, and that's why supervised learning often benefits from strong regularization, like DropOut.
On the other hand, unlabeled data is usually plentiful, and that's why DropOut is typically not needed, and may be detrimental (as it reduces the model capacity).
Even gigantic models like GPT-3 (175e9 parameters) are still underfitting after being updated on 300e9 tokens.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With