On https://keras.io/layers/recurrent/ I see that LSTM layers have a kernel
and a recurrent_kernel
. What is their meaning? In my understanding, we need weights for the 4 gates of an LSTM cell. However, in keras implementation, kernel
has a shape of (input_dim, 4*units) and recurrent_kernel
has a shape of (units, 4*units). So, are both of them somehow implementing the gates?
In that case the main reason for stacking LSTM is to allow for greater model complexity. In case of a simple feedforward net we stack layers to create a hierarchical feature representation of the input data to then use for some machine learning task. The same applies for stacked LSTM's.
The number of units is the number of neurons connected to the layer holding the concatenated vector of hidden state and input (the layer holding both red and green circles below). In this example, there are 2 neurons connected to that layer.
Long Short-Term Memory Network or LSTM, is a variation of a recurrent neural network (RNN) that is quite effective in predicting the long sequences of data like sentences and stock prices over a period of time. It differs from a normal feedforward network because there is a feedback loop in its architecture.
The size of output is 2D array of real numbers. The second dimension is the dimensionality of the output space defined by the units parameter in Keras LSTM implementation.
Building the LSTM in Keras First, we add the Keras LSTM layer, and following this, we add dropout layers for prevention against overfitting. For the LSTM layer, we add 50 units that represent the dimensionality of outer space. The return_sequences parameter is set to true for returning the last output in output.
Keras layers API. Layers are the basic building blocks of neural networks in Keras. A layer consists of a tensor-in tensor-out computation function (the layer's call method) and some state, held in TensorFlow variables (the layer's weights ). A Layer instance is callable, much like a function:
In Keras, you can demand the layer to return a Sequence instead of the last timestep’s output by turning return_sequences=True. This is required when you are stacking multiple LSTM together. Because LSTM requires an input of three dimension tensor.
This ease of creating neural networks is what makes Keras the preferred deep learning framework by many. There are different types of Keras layers available for different purposes while designing your neural network architecture.
Correct me if I'm wrong, but if you take a look at the LSTM equations:
You have 4 W matrices that transform the input and 4 U matrices that transform the hidden state.
Keras saves these sets of 4 matrices into the kernel
and recurrent_kernel
weight arrays. From the code that uses them:
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
Apparently the 4 matrices are stored inside the weight arrays concatenated along the second dimension, which explains the weight array shapes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With