I've got a question on Tensorflow LSTM-Implementation. There are currently several implementations in TF, but I use:
cell = tf.contrib.rnn.BasicLSTMCell(n_units)
Then to get my output I call:
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x,
initial_state=initial_state, time_major=False)
time_major=False
) x
is of shape (batch_size, time_steps, input_length)
batch_size
is my batch_sizetime_steps
is the amount of timesteps my RNN will go throughinput_length
is the length of one of my input vectors (vector fed into the network on one specific timestep on one specific batch)I expect rnn_outputs to be of shape (batch_size, time_steps, n_units, input_length)
as I have not specified another output size.
Documentation of nn.dynamic_rnn
tells me that output is of shape (batch_size, input_length, cell.output_size)
.
The documentation of tf.contrib.rnn.BasicLSTMCell
does have a property output_size
, which is defaulted to n_units (the amount of LSTM-cells I use).
So does each LSTM-Cell only output a scalar for every given timestep? I would expect it to output a vector of the length of the input vector. This seems not to be the case from how I understand it right now, so I am confused. Can you tell me whether that's the case or how I could change it to output a vector of size of the input vector per single lstm-cell maybe?
It outputs a vector of probabilities that we multiply with the last cell state c_{t-1} .
Inputs are cell state from previous cell i.e., "c" superscript (t-1) and output of LSTM cell "a" super script (t-1) and input x super script (t). Outputs for LSTM cell is current cell state i.e., "c" superscript (t) and output of LSTM cell "a" super script (t).
Given these inputs, the LSTM cell produces two outputs: a “true” output and a new hidden state. We can represent this as such: The structure of an LSTM cell/module/unit.
In Deep Learning, Recurrent Neural Networks (RNN) are a family of neural networks that excels in learning from sequential data. A class of RNN that has found practical applications is Long Short-Term Memory (LSTM) because it is robust against the problems of long-term dependency.
I think the primary confusion is on the terminology of the LSTM cell's argument: num_units
. Unfortunately it doesn't mean, as the name suggests, "the no. of LSTM cells" that should be equal to your time-steps. They actually correspond to the number of dimensions in the hidden state (cell state + hidden state vector).
The call to dynamic_rnn()
returns a tensor of shape: [batch_size, time_steps, output_size]
where,
(Please note this) output_size = num_units; if (num_proj = None) in the lstm cell
where as, output_size = num_proj; if it is defined.
Now, typically, you will extract the last time_step's result and project it to the size of output dimensions using a mat-mul + biases
operation manually, or use the num_proj argument in the LSTM cell.
I have been through the same confusion and had to look really deep to get it cleared. Hope this answer clears some of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With