The first arguments in a normal Dense
layer is also units
, and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following:
(This is a reworked version of "Understanding LSTM Networks")
In Keras, when I create an LSTM object like this LSTM(units=N, ...)
, am I actually creating N
of these LSTM units? Or is it the size of the "Neural Network" layers inside the LSTM unit, i.e., the W
's in the formulas? Or is it something else?
For context, I'm working based on this example code.
The following is the documentation: https://keras.io/layers/recurrent/
It says:
units: Positive integer, dimensionality of the output space.
It makes me think it is the number of outputs from the Keras LSTM "layer" object. Meaning the next layer will have N
inputs. Does that mean there actually exists N
of these LSTM units in the LSTM layer, or maybe that that exactly one LSTM unit is run for N
iterations outputting N
of these h[t]
values, from, say, h[t-N]
up to h[t]
?
If it only defines the number of outputs, does that mean the input still can be, say, just one, or do we have to manually create lagging input variables x[t-N]
to x[t]
, one for each LSTM unit defined by the units=N
argument?
As I'm writing this it occurs to me what the argument return_sequences
does. If set to True
all the N
outputs are passed forward to the next layer, while if it is set to False
it only passes the last h[t]
output to the next layer. Am I right?
stateful=True means that you keep the final state for every batch and pass it as initial state for the next batch.
Long Short-Term Memory Network or LSTM, is a variation of a recurrent neural network (RNN) that is quite effective in predicting the long sequences of data like sentences and stock prices over a period of time. It differs from a normal feedforward network because there is a feedback loop in its architecture.
Basically, the unit means the dimension of the inner cells in LSTM. Because in LSTM, the dimension of inner cell (C_t and C_{t-1} in the graph), output mask (o_t in the graph) and hidden/output state (h_t in the graph) should have the SAME dimension, therefore you output's dimension should be unit -length as well.
LSTM(32) with 32 is the "units".
You can check this question for further information, although it is based on Keras-1.x API.
Basically, the unit
means the dimension of the inner cells in LSTM. Because in LSTM, the dimension of inner cell (C_t and C_{t-1} in the graph), output mask (o_t in the graph) and hidden/output state (h_t in the graph) should have the SAME dimension, therefore you output's dimension should be unit
-length as well.
And LSTM
in Keras only define exactly one LSTM block, whose cells is of unit
-length. If you set return_sequence=True
, it will return something with shape: (batch_size, timespan, unit)
. If false
, then it just return the last output in shape (batch_size, unit)
.
As for the input, you should provide input for every timestamp. Basically, the shape is like (batch_size, timespan, input_dim)
, where input_dim
can be different from the unit
. If you just want to provide input at the first step, you can simply pad your data with zeros at other time steps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With