In Keras you can specify a dropout layer like this: <pre class="prettyprint lang-python prettyprint-override"><code>model.add(Dropout(0.5)) </code></pre> But with a GRU cell you can specify the dropout as a parameter in the constructor: <pre class="prettyprint lang-python prettyprint-override"><code>model.add(GRU(units=512, return_sequences=True, dropout=0.5, input_shape=(None, features_size,))) </code></pre> What's the difference? Is one preferable to the other? In Keras' documentation it adds it as a separate dropout layer (see "Sequence classification with LSTM")

The recurrent layers perform the same repeated operation over and over. In each timestep, it takes two inputs: <ul> <li>Your inputs (a step of your sequence) </li> <li>Internal inputs (can be states and the output of the previous step, for instance) </li> </ul> Note that the dimensions of the input and output may not match, which means that "your input" dimensions will not match "the recurrent input (previous step/states)" dimesions. Then in every recurrent timestep there are two operations with two different kernels: <ul> <li>One kernel is applied to "your inputs" to process and transform it in a compatible dimension </li> <li>Another (called recurrent kernel by keras) is applied to the inputs of the previous step.</li> </ul> Because of this, keras also uses two dropout operations in the recurrent layers. (Dropouts that will be applied to every step) <ul> <li>A dropout for the first conversion of your inputs </li> <li>A dropout for the application of the recurrent kernel </li> </ul> So, in fact there are two dropout parameters in RNN layers: <ul> <li> <code>dropout</code>, applied to the first operation on the inputs </li> <li> <code>recurrent_dropout</code>, applied to the other operation on the recurrent inputs (previous output and/or states) </li> </ul> You can see this description coded either in <code>GRUCell</code> and in <code>LSTMCell</code> for instance in the source code. <hr> What is correct? This is open to creativity. You can use a <code>Dropout(...)</code> layer, it's not "wrong", but it will possibly drop "timesteps" too! (Unless you set <code>noise_shape</code> properly or use <code>SpatialDropout1D</code>, which is currently not documented yet) Maybe you want it, maybe you dont. If you use the parameters in the recurrent layer, you will be applying dropouts only to the other dimensions, without dropping a single step. This seems healthy for recurrent layers, unless you want your network to learn how to deal with sequences containing gaps (this last sentence is a supposal). Also, with the dropout parameters, you will be really dropping parts of the kernel as the operations are dropped "in every step", while using a separate layer will let your RNN perform non-dropped operations internally, since your dropout will affect only the final output.

Using Dropout with Keras and LSTM/GRU cell

Tags:

keras

lstm

dropout

In Keras you can specify a dropout layer like this:

Click to copy

model.add(Dropout(0.5))

But with a GRU cell you can specify the dropout as a parameter in the constructor:

Click to copy

model.add(GRU(units=512,
        return_sequences=True,
        dropout=0.5,
        input_shape=(None, features_size,)))

What's the difference? Is one preferable to the other?

In Keras' documentation it adds it as a separate dropout layer (see "Sequence classification with LSTM")

616

asked Jun 06 '18 12:06

BigBadMe

1 Answers

The recurrent layers perform the same repeated operation over and over.

In each timestep, it takes two inputs:

Your inputs (a step of your sequence)
Internal inputs (can be states and the output of the previous step, for instance)

Note that the dimensions of the input and output may not match, which means that "your input" dimensions will not match "the recurrent input (previous step/states)" dimesions.

Then in every recurrent timestep there are two operations with two different kernels:

One kernel is applied to "your inputs" to process and transform it in a compatible dimension
Another (called recurrent kernel by keras) is applied to the inputs of the previous step.

Because of this, keras also uses two dropout operations in the recurrent layers. (Dropouts that will be applied to every step)

A dropout for the first conversion of your inputs
A dropout for the application of the recurrent kernel

So, in fact there are two dropout parameters in RNN layers:

dropout, applied to the first operation on the inputs
recurrent_dropout, applied to the other operation on the recurrent inputs (previous output and/or states)

You can see this description coded either in GRUCell and in LSTMCell for instance in the source code.

What is correct?

This is open to creativity.

You can use a Dropout(...) layer, it's not "wrong", but it will possibly drop "timesteps" too! (Unless you set noise_shape properly or use SpatialDropout1D, which is currently not documented yet)

Maybe you want it, maybe you dont. If you use the parameters in the recurrent layer, you will be applying dropouts only to the other dimensions, without dropping a single step. This seems healthy for recurrent layers, unless you want your network to learn how to deal with sequences containing gaps (this last sentence is a supposal).

Also, with the dropout parameters, you will be really dropping parts of the kernel as the operations are dropped "in every step", while using a separate layer will let your RNN perform non-dropped operations internally, since your dropout will affect only the final output.

answered Oct 22 '22 21:10

Daniel Möller

Related questions
                            
                                Is it possible to automatically infer the class_weight from flow_from_directory in Keras?
                            
                                How to connect LSTM layers in Keras, RepeatVector or return_sequence=True?
                            
                                Character-Word Embeddings from lm_1b in Keras
                            
                                Load saved checkpoint and predict not producing same results as in training
                            
                                Training a tf.keras model with a basic low-level TensorFlow training loop doesn't work
                            
                                Extract target from Tensorflow PrefetchDataset
                            
                                Using Deep Learning to Predict Subsequence from Sequence
                            
                                ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=4
                            
                                How to construct input data to LSTM for time series multi-step horizon with external features?
                            
                                Low GPU usage by Keras / Tensorflow?
                            
                                Keras Classification - Object Detection
                            
                                How to load an image and show the image using keras?
                            
                                Custom weighted loss function in Keras for weighing each element
                            
                                Inputs to eager execution function cannot be Keras symbolic tensors
                            
                                Keras lstm with masking layer for variable-length inputs
                            
                                Save model every 10 epochs tensorflow.keras v2
                            
                                Saving Keras models with Custom Layers
                            
                                What is the preferred ratio between the vocabulary size and embedding dimension?
                            
                                How can I one hot encode a list of strings with Keras?
                            
                                Keras: difference of InputLayer and Input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With