Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inplementation of LSTM in Keras

I'm using Keras using Tensorflow backend.

model = Sequential()
model.add(Masking(mask_value = 0., input_shape = (MAX_LENGTH, 1)))
model.add(LSTM(16, input_shape = (BATCH_SIZE, MAX_LENGTH, 1), return_sequences = False))
model.add(Dense(units = 2))
model.add(Activation("sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])

This python code works, but I wonder whether there are 16 LSTM blocks with 1 cell each, or 1 LSTM block with 16 cells.

Thanks in advance!

LSTM architecture

like image 740
I-was-a-Ki Avatar asked Feb 05 '19 21:02

I-was-a-Ki


People also ask

How LSTM works step by step?

The weight matrix W contains different weights for the current input vector and the previous hidden state for each gate. Just like Recurrent Neural Networks, an LSTM network also generates an output at each time step and this output is used to train the network using gradient descent.


3 Answers

Ok so your question got me thinking and I think I over did it but here goes nothing. Here's a snippet of code I did to get some insights behind the LSTM implementation.

from keras.layers import LSTM
from keras.models import Sequential

model = Sequential()
model.add(LSTM(10, input_shape=(20, 30), return_sequences=True))
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
weights = model.get_weights()

Now, by inspecting the weights shapes we can get an intuition on what's happening.

In [12]: weights[0].shape
Out[12]: (30, 40)
In [14]: weights[1].shape
Out[14]: (10, 40)
In [15]: weights[2].shape
Out[15]: (40,)

And here is a description of them:

In [26]: model.weights
Out[26]: 
[<tf.Variable 'lstm_4/kernel:0' shape=(30, 40) dtype=float32_ref>,
 <tf.Variable 'lstm_4/recurrent_kernel:0' shape=(10, 40) dtype=float32_ref>,
 <tf.Variable 'lstm_4/bias:0' shape=(40,) dtype=float32_ref>]

Those are the only weights available. I also went to see the Keras implementation on https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1765

So you can see that @gorjan was right, it implementes one cell, meaning the 4 gates (for the recurrent input as well as the sequence input), along with their biases.

The "layer" thinking here should be applied to the number of times the LSTM will be unrolled, in this case 30.

Hope this helps.

like image 150
Diego Aguado Avatar answered Sep 24 '22 02:09

Diego Aguado


It's for 1 block, 16 cells, afaik.

like image 45
Slam Avatar answered Sep 21 '22 02:09

Slam


When you are using cells LSTM, GRU, you don't have the notion of layers per se. What you actually have is a cell, that implements few gates. Each of the gates constitutes of a separate weight matrix that the model will learn during training. For example, in your case, what you will have is 1 cell, where each of the gates defined by matrices will have a dimension (feature_size_of_your_input, 16). I suggest that you read: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ really carefully before you start implementing this kind of stuff. Otherwise, you are just using them as a black box model without understanding what is happening under the hood.

like image 34
gorjan Avatar answered Sep 20 '22 02:09

gorjan