Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Output of Tensorflow LSTM-Cell

I've got a question on Tensorflow LSTM-Implementation. There are currently several implementations in TF, but I use:

cell = tf.contrib.rnn.BasicLSTMCell(n_units)
  • where n_units is the amount of 'parallel' LSTM-Cells.

Then to get my output I call:

 rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x,
                        initial_state=initial_state, time_major=False)
  • where (as time_major=False) x is of shape (batch_size, time_steps, input_length)
  • where batch_size is my batch_size
  • where time_steps is the amount of timesteps my RNN will go through
  • where input_length is the length of one of my input vectors (vector fed into the network on one specific timestep on one specific batch)

I expect rnn_outputs to be of shape (batch_size, time_steps, n_units, input_length) as I have not specified another output size. Documentation of nn.dynamic_rnn tells me that output is of shape (batch_size, input_length, cell.output_size). The documentation of tf.contrib.rnn.BasicLSTMCell does have a property output_size, which is defaulted to n_units (the amount of LSTM-cells I use).

So does each LSTM-Cell only output a scalar for every given timestep? I would expect it to output a vector of the length of the input vector. This seems not to be the case from how I understand it right now, so I am confused. Can you tell me whether that's the case or how I could change it to output a vector of size of the input vector per single lstm-cell maybe?

like image 858
LJKS Avatar asked Feb 26 '17 15:02

LJKS


People also ask

What is the output of LSTM layer in TensorFlow?

It outputs a vector of probabilities that we multiply with the last cell state c_{t-1} .

What are the inputs and outputs of an LSTM cell?

Inputs are cell state from previous cell i.e., "c" superscript (t-1) and output of LSTM cell "a" super script (t-1) and input x super script (t). Outputs for LSTM cell is current cell state i.e., "c" superscript (t) and output of LSTM cell "a" super script (t).

What is the output of an LSTM layer?

Given these inputs, the LSTM cell produces two outputs: a “true” output and a new hidden state. We can represent this as such: The structure of an LSTM cell/module/unit.

What does LSTM layer do in TensorFlow?

In Deep Learning, Recurrent Neural Networks (RNN) are a family of neural networks that excels in learning from sequential data. A class of RNN that has found practical applications is Long Short-Term Memory (LSTM) because it is robust against the problems of long-term dependency.


1 Answers

I think the primary confusion is on the terminology of the LSTM cell's argument: num_units. Unfortunately it doesn't mean, as the name suggests, "the no. of LSTM cells" that should be equal to your time-steps. They actually correspond to the number of dimensions in the hidden state (cell state + hidden state vector). The call to dynamic_rnn() returns a tensor of shape: [batch_size, time_steps, output_size] where,

(Please note this) output_size = num_units; if (num_proj = None) in the lstm cell
where as, output_size = num_proj; if it is defined.

Now, typically, you will extract the last time_step's result and project it to the size of output dimensions using a mat-mul + biases operation manually, or use the num_proj argument in the LSTM cell.
I have been through the same confusion and had to look really deep to get it cleared. Hope this answer clears some of it.

like image 186
Animesh Karnewar Avatar answered Sep 20 '22 10:09

Animesh Karnewar