Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

Tags:

I am slightly confused on the different ways to apply dropout to my Sequential model in Keras.

My model is the following:

model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Assume that I added an extra Dropout layer after the Embedding layer in the below manner:

model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(Dropout(0.25))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Will this make any difference since I then specified that the dropout should be 0.5 in the LSTM parameter specifically, or am I getting this all wrong?

568

asked Mar 23 '18 22:03

Danny

1 Answers

When you add a dropout layer you're adding dropout to the output of the previous layer only, in your case you are adding dropout to your embedding layer.

An LSTM cell is more complex than a single layer neural network, when you specify the dropout in the LSTM cell you are actually applying dropout to 4 different sub neural network operations in the LSTM cell.

Below is a visualization of an LSMT cell from Colah's blog on LSTMs (the best visualization of LSTM/RNNs out there, http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The yellow boxes represent 4 fully connected network operations (each with their own weights) which occur under the hood of the LSTM - this is neatly wrapped up in the LSTM cell wrapper, though it's not really so hard to code by hand.

enter image description here

When you specify dropout=0.5 in the LSTM cell, what you are doing under the hood is applying dropout to each of these 4 neural network operations. This is effectively adding model.add(Dropout(0.25)) 4 times, once after each of the 4 yellow blocks you see in the diagram, within the internals of the LSTM cell.

I hope that short discussion makes it more clear how the dropout applied in the LSTM wrapper, which is applied to effectively 4 sub networks within the LSTM, is different from the dropout you applied once in the sequence after your embedding layer. And to answer your question directly, yes, these two dropout definitions are very much different.

Notice, as a further example to help elucidate the point: if you were to define a simple 5 layer fully connected neural network you would need to define dropout after each layer, not once. model.add(Dropout(0.25)) is not some kind of global setting, it's adding the dropout operation to a pipeline of operations. If you have 5 layers, you need to add 5 dropout operations.

132

answered Nov 06 '22 01:11

David Parks

Related questions
                            
                                Outlook using python win32com to iterate subfolders
                            
                                Find count of characters within the string in Python
                            
                                ImportError: No module named geopandas
                            
                                closing session in tensorflow doesn't reset graph
                            
                                Python (Pandas) Add subtotal on each lvl of multiindex dataframe
                            
                                pip install pickle not working - no such file or directory
                            
                                expanding a dataframe based on start and end columns (speed)
                            
                                How to remove the quotes from a string for SQL query in Python?
                            
                                Convert column values to lower case only if they are string
                            
                                How to remove all the values in a string except for the chosen ones [duplicate]
                            
                                json.loads() doesn't keep order [duplicate]
                            
                                Check if module is running in Jupyter or not
                            
                                Is there a way to delete all cells at once in jupyter?
                            
                                Python download youtube with specific filename
                            
                                Mask from max values in numpy array, specific axis
                            
                                How to delete a global variable from inside a function?
                            
                                Why do I get an AttributeError when using pandas apply?
                            
                                How to make a command case insensitive in discord.py
                            
                                Duplicating training examples to handle class imbalance in a pandas data frame
                            
                                WebRTC Python implementation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

Tags:

python

machine-learning

tensorflow

keras

lstm

Danny

People also ask

1 Answers

David Parks

Recent Activity

Donate For Us