Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is actually num_unit in LSTM cell circuit?

I tried very hard to search everywhere, but I couldn't find what num_units in TensorFlow actually is. I tried to relate my question to this question, but I couldn't get clear explanation there.


In TensorFlow, when creating an LSTM-based RNN, we use the following command

cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)

As Colah's blog says, this is a basic LSTM cell:

enter image description here

Now, suppose my data is:

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]]   # hihell
x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

y_data = [[1, 0, 2, 3, 3, 4]]    # ihello

My input is:

x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

which is of shape [6,5].

In this blog, we have the following picture

enter image description here

As far as I know, the BasicLSTMCell will unroll for t time steps, where t is my number of rows (please, correct me if I am wrong!). For example, in the following figure, the LSTM is unrolled for t = 28 time steps.

enter image description here

In the Colah's blog, it's written

each line carries an entire vector

So, let's see how my [6,5] input matrix will go through this LSTM-based RNN.

enter image description here

If my above diagram is correct, then what exactly is num_units (which we defined in LSTM cell)? Is it a parameter of an LSTM cell?

If num_unit is a parameter of a single LSTM cell, then it should be something like:

enter image description here

enter image description here

If above diagram is correct, then where are those 5 num_units in the following schematic representation of the LSTM cell (according to Colah's blog)?

enter image description here


If you can give your answer with a figure, that would be really helpful! You can edit or create new whiteboard diagram here.

like image 840
Aaditya Ura Avatar asked Mar 11 '18 21:03

Aaditya Ura


People also ask

What is the meaning of the number of units in the LSTM cell?

The number of units is the number of neurons connected to the layer holding the concatenated vector of hidden state and input (the layer holding both red and green circles below). In this example, there are 2 neurons connected to that layer.

What is the difference between cell state and hidden state in LSTM?

The cell state is meant to encode a kind of aggregation of data from all previous time-steps that have been processed, while the hidden state is meant to encode a kind of characterization of the previous time-step's data.

WHAT IS units in Tensorflow?

num_units can be interpreted as the analogy of hidden layer from the feed forward neural network. The number of nodes in hidden layer of a feed forward neural network is equivalent to num_units number of LSTM units in a LSTM cell at every time step of the network.


1 Answers

Your understanding is quite correct. However, unfortunately, there is inconsistency between the Tensorflow terminology and the literature. In order to understand, you need to dig through the Tensorflow implementation code.

A cell in the Tensorflow universe is called an LSTM layer in Colah's universe (i.e an unrolled version). That is why you always define a single cell, and not a layer in your Tensorflow architecture. For example,

cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)

Check the code here.

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

Therefore, in order to understand num_units in Tensorflow, its best to imagine an unrolled LSTM as below.

enter image description here

In an unrolled version, you have an input X_t which is a tensor. When you specify an input of the shape

[batch_size,time_steps,n_input]

to Tensorflow, it knows how many times to unroll it from your time_steps parameter.

So if you have X_t as a 1D array in TensorFlow, then in the Colahs unrolled version each LSTM cell x_t becomes a scalar value (Please observe the capital case X (vector/array) and small case x(scalar) - Also in Colah's figures)

If you have X_t as a 2D array in the Tensorflow, then in the Colahs unrolled version each LSTM cell x_t becomes a 1D array/vector (as in your case here) and so on.

Now here comes the most important question.

How would Tensorflow know what is the output/hidden dimension ** Z_t/H_t ?

(Please note the difference between H_t and Z_t - I usually prefer to keep them separate as H_t goes back to input (the loop) and Z_t is the output - Not shown in figure)

Would it be same dimension as X_t ?

No.It can be of any different shape. You need to specify it to the Tensorflow. And that is num_units - The Output Size

Check here in the code:

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300

    @property
    def output_size(self):
        return self._num_units

Tensorflow uses the implementation of LSTM cell as defined in Colahs universe from the following paper:

https://arxiv.org/pdf/1409.2329.pdf

like image 56
user1302884 Avatar answered Oct 03 '22 05:10

user1302884