I tried very hard to search everywhere, but I couldn't find what num_units
in TensorFlow actually is. I tried to relate my question to this question, but I couldn't get clear explanation there.
In TensorFlow, when creating an LSTM-based RNN, we use the following command
cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)
As Colah's blog says, this is a basic LSTM cell:
Now, suppose my data is:
idx2char = ['h', 'i', 'e', 'l', 'o']
# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]] # hihell
x_one_hot = [[[1, 0, 0, 0, 0], # h 0
[0, 1, 0, 0, 0], # i 1
[1, 0, 0, 0, 0], # h 0
[0, 0, 1, 0, 0], # e 2
[0, 0, 0, 1, 0], # l 3
[0, 0, 0, 1, 0]]] # l 3
y_data = [[1, 0, 2, 3, 3, 4]] # ihello
My input is:
x_one_hot = [[[1, 0, 0, 0, 0], # h 0
[0, 1, 0, 0, 0], # i 1
[1, 0, 0, 0, 0], # h 0
[0, 0, 1, 0, 0], # e 2
[0, 0, 0, 1, 0], # l 3
[0, 0, 0, 1, 0]]] # l 3
which is of shape [6,5]
.
In this blog, we have the following picture
As far as I know, the BasicLSTMCell
will unroll for t
time steps, where t
is my number of rows (please, correct me if I am wrong!). For example, in the following figure, the LSTM is unrolled for t = 28
time steps.
In the Colah's blog, it's written
each line carries an entire vector
So, let's see how my [6,5]
input matrix will go through this LSTM-based RNN.
If my above diagram is correct, then what exactly is num_units
(which we defined in LSTM cell)? Is it a parameter of an LSTM cell?
If num_unit
is a parameter of a single LSTM cell, then it should be something like:
If above diagram is correct, then where are those 5 num_units
in the following schematic representation of the LSTM cell (according to Colah's blog)?
If you can give your answer with a figure, that would be really helpful! You can edit or create new whiteboard diagram here.
The number of units is the number of neurons connected to the layer holding the concatenated vector of hidden state and input (the layer holding both red and green circles below). In this example, there are 2 neurons connected to that layer.
The cell state is meant to encode a kind of aggregation of data from all previous time-steps that have been processed, while the hidden state is meant to encode a kind of characterization of the previous time-step's data.
num_units can be interpreted as the analogy of hidden layer from the feed forward neural network. The number of nodes in hidden layer of a feed forward neural network is equivalent to num_units number of LSTM units in a LSTM cell at every time step of the network.
Your understanding is quite correct. However, unfortunately, there is inconsistency between the Tensorflow terminology and the literature. In order to understand, you need to dig through the Tensorflow implementation code.
A cell in the Tensorflow universe is called an LSTM layer in Colah's universe (i.e an unrolled version). That is why you always define a single cell, and not a layer in your Tensorflow architecture. For example,
cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)
Check the code here.
https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90
The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.
Therefore, in order to understand num_units in Tensorflow, its best to imagine an unrolled LSTM as below.
In an unrolled version, you have an input X_t which is a tensor. When you specify an input of the shape
[batch_size,time_steps,n_input]
to Tensorflow, it knows how many times to unroll it from your time_steps parameter.
So if you have X_t as a 1D array in TensorFlow, then in the Colahs unrolled version each LSTM cell x_t becomes a scalar value (Please observe the capital case X (vector/array) and small case x(scalar) - Also in Colah's figures)
If you have X_t as a 2D array in the Tensorflow, then in the Colahs unrolled version each LSTM cell x_t becomes a 1D array/vector (as in your case here) and so on.
Now here comes the most important question.
How would Tensorflow know what is the output/hidden dimension ** Z_t/H_t ?
(Please note the difference between H_t and Z_t - I usually prefer to keep them separate as H_t goes back to input (the loop) and Z_t is the output - Not shown in figure)
Would it be same dimension as X_t ?
No.It can be of any different shape. You need to specify it to the Tensorflow. And that is num_units - The Output Size
Check here in the code:
https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300
@property
def output_size(self):
return self._num_units
Tensorflow uses the implementation of LSTM cell as defined in Colahs universe from the following paper:
https://arxiv.org/pdf/1409.2329.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With