Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow LSTM Dropout Implementation

  • How specifically does tensorflow apply dropout when calling tf.nn.rnn_cell.DropoutWrapper() ?

Everything I read about applying dropout to rnn's references this paper by Zaremba et. al which says don't apply dropout between recurrent connections. Neurons should be dropped out randomly before or after LSTM layers, but not inter-LSTM layers. Ok.

  • The question I have is how are the neurons turned off with respect to time?

In the paper that everyone cites, it seems that a random 'dropout mask' is applied at each timestep, rather than generating one random 'dropout mask' and reusing it, applying it to all the timesteps in a given layer being dropped out. Then generating a new 'dropout mask' on the next batch.

Further, and probably what matters more at the moment, how does tensorflow do it? I've checked the tensorflow api and tried searching around for a detailed explanation but have yet to find one.

  • Is there a way to dig into the actual tensorflow source code?
like image 248
beeCwright Avatar asked Feb 27 '17 14:02

beeCwright


People also ask

Can I use dropout with LSTM?

After the LSTM you have shape = (None, 10) . So, you use Dropout the same way you would use in any fully connected network.

How does dropout work in LSTM?

Dropout is a regularization method where input and recurrent connections to LSTM units are probabilistically excluded from activation and weight updates while training a network. This has the effect of reducing overfitting and improving model performance.

How does dropout work in Tensorflow?

dropout() function is an inbuilt function of Tensorflow. js library. This function is used to prevent overfitting in a model by randomly setting a fraction rate of input units to 0 at each update during training time.

What is dropout layer in Tensorflow?

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.


1 Answers

You can check the implementation here.

It uses the dropout op on the input into the RNNCell, then on the output, with the keep probabilities you specify.

It seems like each sequence you feed in gets a new mask for input, then for output. No changes inside of the sequence.

like image 190
Robert Lacok Avatar answered Sep 19 '22 13:09

Robert Lacok