Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between two implementations of RNN in tensorflow?

I find two kinds of implementations of RNN in tensorflow.

The first implementations is this (from line 124 to 129). It uses a loop to define each step of input in RNN.

with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
        states.append(state)

The second implementation is this (from line 51 to 70). It doesn't use any loop to define each step of input in RNN.

def RNN(_X, _istate, _weights, _biases):

    # input shape: (batch_size, n_steps, n_input)
    _X = tf.transpose(_X, [1, 0, 2])  # permute n_steps and batch_size
    # Reshape to prepare input to hidden activation
    _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
    # Linear activation
    _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']

    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

    # Get lstm cell output
    outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate)

    # Linear activation
    # Get inner loop last output
    return tf.matmul(outputs[-1], _weights['out']) + _biases['out']



In the first implementation, I find there is no weight matrix between input unit to hidden unit, only define weight matrix between hidden unit to out put unit (from line 132 to 133)..

output = tf.reshape(tf.concat(1, outputs), [-1, size])
        softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])
        logits = tf.matmul(output, softmax_w) + softmax_b

But in the second implementation, both of the weight matrices are defined (from line 42 to 47).

weights = {
    'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'hidden': tf.Variable(tf.random_normal([n_hidden])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

I wonder why?

like image 403
Nils Cao Avatar asked May 16 '16 11:05

Nils Cao


1 Answers

The difference I noticed is that the code in the second implementation uses tf.nn.rnn which takes list of inputs for each time step and generated the list of outputs for each time step.

(Inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size].)

So, if you check the code in the second implementation on line 62 the input data is shaped into n_steps * (batch_size, n_hidden)

# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

In the 1st implementation they are looping through the n_time_steps and providing the input and get the corresponding output and storing in the outputs list.

Code snippet from line 113 to 117

outputs = []
    state = self._initial_state
    with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)

Coming to your second question:

If you carefully notice the way the inputs are being fed to the RNN in both the implementations.

In the first implementation the inputs are already of shape batch_size x num_steps (here num_steps is hidden size):

self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])

Whereas in the second implementation the initial inputs are of shape (batch_size x n_steps x n_input). So a weight matrix is required to transform to the shape (n_steps x batch_size x hidden_size):

    # Input shape: (batch_size, n_steps, n_input)
    _X = tf.transpose(_X, [1, 0, 2])  # Permute n_steps and batch_size
    # Reshape to prepare input to hidden activation
    _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
    # Linear activation
    _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

I hope this is helpful...

like image 167
Aravind Pilla Avatar answered Oct 18 '22 18:10

Aravind Pilla