I'm using Keras using Tensorflow backend.
model = Sequential()
model.add(Masking(mask_value = 0., input_shape = (MAX_LENGTH, 1)))
model.add(LSTM(16, input_shape = (BATCH_SIZE, MAX_LENGTH, 1), return_sequences = False))
model.add(Dense(units = 2))
model.add(Activation("sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])
This python code works, but I wonder whether there are 16 LSTM blocks with 1 cell each, or 1 LSTM block with 16 cells.
Thanks in advance!
The weight matrix W contains different weights for the current input vector and the previous hidden state for each gate. Just like Recurrent Neural Networks, an LSTM network also generates an output at each time step and this output is used to train the network using gradient descent.
Ok so your question got me thinking and I think I over did it but here goes nothing. Here's a snippet of code I did to get some insights behind the LSTM implementation.
from keras.layers import LSTM
from keras.models import Sequential
model = Sequential()
model.add(LSTM(10, input_shape=(20, 30), return_sequences=True))
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
weights = model.get_weights()
Now, by inspecting the weights shapes we can get an intuition on what's happening.
In [12]: weights[0].shape
Out[12]: (30, 40)
In [14]: weights[1].shape
Out[14]: (10, 40)
In [15]: weights[2].shape
Out[15]: (40,)
And here is a description of them:
In [26]: model.weights
Out[26]:
[<tf.Variable 'lstm_4/kernel:0' shape=(30, 40) dtype=float32_ref>,
<tf.Variable 'lstm_4/recurrent_kernel:0' shape=(10, 40) dtype=float32_ref>,
<tf.Variable 'lstm_4/bias:0' shape=(40,) dtype=float32_ref>]
Those are the only weights available. I also went to see the Keras implementation on https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1765
So you can see that @gorjan was right, it implementes one cell, meaning the 4 gates (for the recurrent input as well as the sequence input), along with their biases.
The "layer" thinking here should be applied to the number of times the LSTM will be unrolled, in this case 30.
Hope this helps.
It's for 1 block, 16 cells, afaik.
When you are using cells LSTM, GRU
, you don't have the notion of layers per se. What you actually have is a cell, that implements few gates. Each of the gates constitutes of a separate weight matrix that the model will learn during training. For example, in your case, what you will have is 1 cell, where each of the gates defined by matrices will have a dimension (feature_size_of_your_input, 16)
. I suggest that you read: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ really carefully before you start implementing this kind of stuff. Otherwise, you are just using them as a black box model without understanding what is happening under the hood.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With