Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret clearly the meaning of the units parameter in Keras?

Tags:

python

keras

lstm

I am wondering how LSTM work in Keras. In this tutorial for example, as in many others, you can find something like this :

model.add(LSTM(4, input_shape=(1, look_back)))

What does the "4" mean. Is it the number of neuron in the layer. By neuron, I mean something that for each instance gives a single output ?

Actually, I found this brillant discussion but wasn't really convinced by the explanation mentioned in the reference given.

On the scheme, one can see the num_unitsillustrated and I think I am not wrong in saying that each of this unit is a very atomic LSTM unit (i.e. the 4 gates). However, how these units are connected ? If I am right (but not sure), x_(t-1)is of size nb_features, so each feature would be an input of a unit and num_unit must be equal to nb_features right ?

Now, let's talk about keras. I have read this post and the accepted answer and get trouble. Indeed, the answer says :

Basically, the shape is like (batch_size, timespan, input_dim), where input_dim can be different from the unit

In which case ? I am in trouble with the previous reference...

Moreover, it says,

LSTM in Keras only define exactly one LSTM block, whose cells is of unit-length.

Okay, but how do I define a full LSTM layer ? Is it the input_shape that implicitely create as many blocks as the number of time_steps (which, according to me is the first parameter of input_shape parameter in my piece of code ?

Thanks for lighting me

EDIT : would it also be possible to detail clearly how to reshape data of, say, size (n_samples, n_features) for a stateful LSTM model ? How to deal with time_steps and batch_size ?

like image 298
MysteryGuy Avatar asked Aug 20 '18 14:08

MysteryGuy


3 Answers

First, units in LSTM is NOT the number of time_steps.

Each LSTM cell(present at a given time_step) takes in input x and forms a hidden state vector a, the length of this hidden unit vector is what is called the units in LSTM(Keras).

You should keep in mind that there is only one RNN cell created by the code

keras.layers.LSTM(units, activation='tanh', …… )

and RNN operations are repeated by Tx times by the class itself.

I've linked this to help you understand it better in with a very simple code.

like image 103
ajaysinghnegi Avatar answered Nov 06 '22 22:11

ajaysinghnegi


You can (sort of) think of it exactly as you think of fully connected layers. Units are neurons.

The dimension of the output is the number of neurons, as with most of the well known layer types.

The difference is that in LSTMs, these neurons will not be completely independent of each other, they will intercommunicate due to the mathematical operations lying under the cover.

Before going further, it might be interesting to take a look at this very complete explanation about LSTMs, its inputs/outputs and the usage of stative = true/false: Understanding Keras LSTMs. Notice that your input shape should be input_shape=(look_back, 1). The input shape goes for (time_steps, features).

While this is a series of fully connected layers:

  • hidden layer 1: 4 units
  • hidden layer 2: 4 units
  • output layer: 1 unit fully connected layers

This is a series of LSTM layers:

Where input_shape = (batch_size, arbitrary_steps, 3)

enter image description here

Each LSTM layer will keep reusing the same units/neurons over and over until all the arbitrary timesteps in the input are processed.

  • The output will have shape:
    • (batch, arbitrary_steps, units) if return_sequences=True.
    • (batch, units) if return_sequences=False.
  • The memory states will have a size of units.
  • The inputs processed from the last step will have size of units.

To be really precise, there will be two groups of units, one working on the raw inputs, the other working on already processed inputs coming from the last step. Due to the internal structure, each group will have a number of parameters 4 times bigger than the number of units (this 4 is not related to the image, it's fixed).

Flow:

  • Takes an input with n steps and 3 features
  • Layer 1:
    • For each time step in the inputs:
      • Uses 4 units on the inputs to get a size 4 result
      • Uses 4 recurrent units on the outputs of the previous step
    • Outputs the last (return_sequences=False) or all (return_sequences = True) steps
      • output features = 4
  • Layer 2:
    • Same as layer 1
  • Layer 3:
    • For each time step in the inputs:
      • Uses 1 unit on the inputs to get a size 1 result
      • Uses 1 unit on the outputs of the previous step
    • Outputs the last (return_sequences=False) or all (return_sequences = True) steps
like image 41
Daniel Möller Avatar answered Nov 06 '22 23:11

Daniel Möller


The number of units is the size (length) of the internal vector states, h and c of the LSTM. That is no matter the shape of the input, it is upscaled (by a dense transformation) by the various kernels for the i, f, and o gates. The details of how the resulting latent features are transformed into h and c are described in the linked post. In your example, the input shape of data

(batch_size, timesteps, input_dim)

will be transformed to

(batch_size, timesteps, 4)

if return_sequences is true, otherwise only the last h will be emmited making it (batch_size, 4). I would recommend using a much higher latent dimension, perhaps 128 or 256 for most problems.

like image 2
modesitt Avatar answered Nov 06 '22 22:11

modesitt