I realize this post is asking a similar question to this.
But I just wanted some clarification, preferably a link to some kind of Keras documentation that says the difference.
In my mind, dropout
works between neurons. And recurrent_dropout
works each neurons between timesteps. But, I have no grounding for this whatsoever.
The documentation on the Keras webite is not helpful at all.
After the LSTM you have shape = (None, 10) . So, you use Dropout the same way you would use in any fully connected network. It drops a different group of features for each sample.
Recurrent Dropout is a regularization method for recurrent neural networks. Dropout is applied to the updates to LSTM memory cells (or GRU states), i.e. it drops out the input/update gate in LSTM/GRU.
Recurrent dropout technique is used to improve the performance and the generalization power of the recurrent networks. Recurrent dropout is used to fight overfitting in the recurrent layers. Recurrent dropout helps in regularization of recurrent neural networks.
The dropout proba- bility started at 0.5 and linearly decreased to 0.0 after 8 epochs, after which no dropout was used. In [8], dropout was applied to LSTMs at the point where the input comes from the previ- ous layer (this is equivalent to our ”Location 2” below).
Keras LSTM documentation contains high-level explanation:
dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
But this totally corresponds to the answer you refer to:
Regular dropout is applied on the inputs and/or the outputs, meaning the vertical arrows from
x_t
and toh_t
. ...Recurrent dropout masks (or "drops") the connections between the recurrent units; that would be the horizontal arrows in your picture.
If you're interested in details on the formula level, the best way is to inspect the source code: keras/layers/recurrent.py
, look for rec_dp_mask
(recurrent dropout mask) and dp_mask
. One is affecting the h_tm1
(the previous memory cell), the other affects the inputs
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With