Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what exactly does 'tf.contrib.rnn.DropoutWrapper'' in tensorflow do? ( three citical questions)

As I know, DropoutWrapper is used as follows

__init__(
cell,
input_keep_prob=1.0,
output_keep_prob=1.0,
state_keep_prob=1.0,
variational_recurrent=False,
input_size=None,
dtype=None,
seed=None
)

.

cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

the only thing I know is that it is use for dropout while training. Here are my three questions

  1. What are input_keep_prob,output_keep_prob and state_keep_prob respectively? (I guess they define dropout probability of each part of RNN, but exactly where?)

  2. Is dropout in this context applied to RNN not only when training but also prediction process? If it's true, is there any way to decide whether I do or don't use dropout at prediction process?

  3. As API documents in tensorflow web page, if variational_recurrent=True dropout works according to the method on a paper "Y. Gal, Z Ghahramani. "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks". https://arxiv.org/abs/1512.05287 " I understood this paper roughly. When I train RNN, I use 'batch' not single time-series. In this case, tensorflow automatically assign different dropout mask to different time-series in a batch?
like image 702
Eric Avatar asked Aug 04 '17 12:08

Eric


1 Answers

  1. input_keep_prob is for the dropout level (inclusion probability) added when fitting feature weights. output_keep_prob is for the dropout level added for each RNN unit output. state_keep_prob is for the hidden state that is fed to the next layer.
  2. You can initialize each of the above mentioned parameters as follows:
import tensorflow as tf
dropout_placeholder = tf.placeholder_with_default(tf.cast(1.0, tf.float32))
tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicRNNCell(n_hidden_rnn),

input_keep_prob = dropout_placeholder, output_keep_prob = dropout_placeholder, 
state_keep_prob = dropout_placeholder)

The default dropout level will be 1 during prediction or anything else that we can feed during training.

  1. The masking is done for the fitted weights rather than for the sequences that are included in the batch. As far as I know, it's done for the entire batch.
like image 168
omer sagi Avatar answered Nov 12 '22 13:11

omer sagi