What is actually num_unit in LSTM cell circuit?

Tags:

I tried very hard to search everywhere, but I couldn't find what num_units in TensorFlow actually is. I tried to relate my question to this question, but I couldn't get clear explanation there.

In TensorFlow, when creating an LSTM-based RNN, we use the following command

cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)

As Colah's blog says, this is a basic LSTM cell:

enter image description here

Now, suppose my data is:

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]]   # hihell
x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

y_data = [[1, 0, 2, 3, 3, 4]]    # ihello

My input is:

x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

which is of shape [6,5].

In this blog, we have the following picture

enter image description here

As far as I know, the BasicLSTMCell will unroll for t time steps, where t is my number of rows (please, correct me if I am wrong!). For example, in the following figure, the LSTM is unrolled for t = 28 time steps.

enter image description here

In the Colah's blog, it's written

each line carries an entire vector

So, let's see how my [6,5] input matrix will go through this LSTM-based RNN.

enter image description here

If my above diagram is correct, then what exactly is num_units (which we defined in LSTM cell)? Is it a parameter of an LSTM cell?

If num_unit is a parameter of a single LSTM cell, then it should be something like:

enter image description here

If above diagram is correct, then where are those 5 num_units in the following schematic representation of the LSTM cell (according to Colah's blog)?

enter image description here

If you can give your answer with a figure, that would be really helpful! You can edit or create new whiteboard diagram here.

840

asked Mar 11 '18 21:03

Aaditya Ura

1 Answers

Your understanding is quite correct. However, unfortunately, there is inconsistency between the Tensorflow terminology and the literature. In order to understand, you need to dig through the Tensorflow implementation code.

A cell in the Tensorflow universe is called an LSTM layer in Colah's universe (i.e an unrolled version). That is why you always define a single cell, and not a layer in your Tensorflow architecture. For example,

cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)

Check the code here.

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

Therefore, in order to understand num_units in Tensorflow, its best to imagine an unrolled LSTM as below.

enter image description here

In an unrolled version, you have an input X_t which is a tensor. When you specify an input of the shape

[batch_size,time_steps,n_input]

to Tensorflow, it knows how many times to unroll it from your time_steps parameter.

So if you have X_t as a 1D array in TensorFlow, then in the Colahs unrolled version each LSTM cell x_t becomes a scalar value (Please observe the capital case X (vector/array) and small case x(scalar) - Also in Colah's figures)

If you have X_t as a 2D array in the Tensorflow, then in the Colahs unrolled version each LSTM cell x_t becomes a 1D array/vector (as in your case here) and so on.

Now here comes the most important question.

How would Tensorflow know what is the output/hidden dimension ** Z_t/H_t ?

(Please note the difference between H_t and Z_t - I usually prefer to keep them separate as H_t goes back to input (the loop) and Z_t is the output - Not shown in figure)

Would it be same dimension as X_t ?

No.It can be of any different shape. You need to specify it to the Tensorflow. And that is num_units - The Output Size

Check here in the code:

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300

    @property
    def output_size(self):
        return self._num_units

Tensorflow uses the implementation of LSTM cell as defined in Colahs universe from the following paper:

https://arxiv.org/pdf/1409.2329.pdf

answered Oct 03 '22 05:10

user1302884

Related questions
                            
                                representation of files and folder-tree of Google Drive folder using Python API
                            
                                Using Flask-socketio and the socketIO client
                            
                                Apply a function on all possible combination of columns in a dataframe in Python -- Better way
                            
                                Cannot use click() in Selenium with mobile emulated chromedriver
                            
                                Wrapping C++ code for python using SWIG. Can't use cout command
                            
                                Plotting with matplotlib: TypeError: float() argument must be a string or a number
                            
                                Why is flask using all of my memory?
                            
                                ImportError: Importing the devappserver sandbox module from user application code is not permitted
                            
                                import cloudstorage, SyntaxError: invalid syntax
                            
                                Does conda env support development dependencies?
                            
                                Can you "cache" matplotlib plots and show them dynamically?
                            
                                Is it possible use keyboard actions without using the "PyAutoGUI" library on jenkins?
                            
                                How to set path to chromedriver in heroku chromedriver buildpack
                            
                                Obtaining summary from logistic regression(Python)
                            
                                How to run the Python program in the background in Ubuntu server
                            
                                How to execute aws glue scripts using python 2.7 from local machine?
                            
                                How can Celery workflows include dynamically generated groups?
                            
                                IntelliJ IDEA says "unresolved reference" on everything (Python, virtual environment)
                            
                                OAuth2 authorization for API in Python
                            
                                Deep multi-task learning with missing labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is actually num_unit in LSTM cell circuit?

Tags:

python

tensorflow

deep-learning

lstm

rnn

Aaditya Ura

People also ask

1 Answers

user1302884

Recent Activity

Donate For Us