Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Dropout with Keras and LSTM/GRU cell

In Keras you can specify a dropout layer like this:

model.add(Dropout(0.5))

But with a GRU cell you can specify the dropout as a parameter in the constructor:

model.add(GRU(units=512,
        return_sequences=True,
        dropout=0.5,
        input_shape=(None, features_size,)))

What's the difference? Is one preferable to the other?

In Keras' documentation it adds it as a separate dropout layer (see "Sequence classification with LSTM")

like image 616
BigBadMe Avatar asked Jun 06 '18 12:06

BigBadMe


People also ask

Can I use dropout with LSTM?

After the LSTM you have shape = (None, 10) . So, you use Dropout the same way you would use in any fully connected network.

What is dropout in LSTM Keras?

Dropout is easily implemented by randomly selecting nodes to be dropped out with a given probability (e.g., 20%) in each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.

How does dropout work in LSTM?

Dropout is a regularization method where input and recurrent connections to LSTM units are probabilistically excluded from activation and weight updates while training a network. This has the effect of reducing overfitting and improving model performance.

What is dropout and recurrent dropout in LSTM?

Recurrent Dropout is a regularization method for recurrent neural networks. Dropout is applied to the updates to LSTM memory cells (or GRU states), i.e. it drops out the input/update gate in LSTM/GRU.


1 Answers

The recurrent layers perform the same repeated operation over and over.

In each timestep, it takes two inputs:

  • Your inputs (a step of your sequence)
  • Internal inputs (can be states and the output of the previous step, for instance)

Note that the dimensions of the input and output may not match, which means that "your input" dimensions will not match "the recurrent input (previous step/states)" dimesions.

Then in every recurrent timestep there are two operations with two different kernels:

  • One kernel is applied to "your inputs" to process and transform it in a compatible dimension
  • Another (called recurrent kernel by keras) is applied to the inputs of the previous step.

Because of this, keras also uses two dropout operations in the recurrent layers. (Dropouts that will be applied to every step)

  • A dropout for the first conversion of your inputs
  • A dropout for the application of the recurrent kernel

So, in fact there are two dropout parameters in RNN layers:

  • dropout, applied to the first operation on the inputs
  • recurrent_dropout, applied to the other operation on the recurrent inputs (previous output and/or states)

You can see this description coded either in GRUCell and in LSTMCell for instance in the source code.


What is correct?

This is open to creativity.

You can use a Dropout(...) layer, it's not "wrong", but it will possibly drop "timesteps" too! (Unless you set noise_shape properly or use SpatialDropout1D, which is currently not documented yet)

Maybe you want it, maybe you dont. If you use the parameters in the recurrent layer, you will be applying dropouts only to the other dimensions, without dropping a single step. This seems healthy for recurrent layers, unless you want your network to learn how to deal with sequences containing gaps (this last sentence is a supposal).

Also, with the dropout parameters, you will be really dropping parts of the kernel as the operations are dropped "in every step", while using a separate layer will let your RNN perform non-dropped operations internally, since your dropout will affect only the final output.

like image 89
Daniel Möller Avatar answered Oct 22 '22 21:10

Daniel Möller