I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this:
model = Sequential()
model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False))
model.add(Dense(feature_count))
model.add(Activation("linear"))
The weights of the LSTM layer do have the following shapes:
for weight in model.get_weights(): # weights from Dense layer omitted
print(weight.shape)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
In short, it looks like there are four "elements" in this LSTM layer. I'm wondering now how to interpret them:
Where is the time_steps
parameter in this representation? How does it influence the weights?
I've read that a LSTM consists of several blocks, like an input and a forget gate. If those are represented in these weight matrices, which matrix belongs to which gate?
Is there any way to see what the network has learned? For example, how much does it take from the last time step (t-1
if we want to forecast t
) and how much from t-2
etc? It would be interesting to know if we could read from the weights that the input t-5
is completely irrelevant, for example.
Clarifications and hints would be greatly appreciated.
In the LSTM figure, we can see that we have 8 different weight parameters (4 associated with the hidden state(cell state) and 4 associated with the input vector). We also have 4 different bias parameters. To better understand this we can use the following equations and better understand the operations in LSTM cell.
LSTM model architecture. In this model, the LSTM hidden state size is 3.
In general, there are no guidelines on how to determine the number of layers or the number of memory cells in an LSTM. The number of layers and cells required in an LSTM might depend on several aspects of the problem: The complexity of the dataset, such as the number of features, the number of data points, etc.
If you are using Keras 2.2.0
When you print
print(model.layers[0].trainable_weights)
you should see three tensors: lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0
One of the dimensions of each tensor should be a product of
4 * number_of_units
where number_of_units is your number of neurons. Try:
units = int(int(model.layers[0].trainable_weights[0].shape[1])/4)
print("No units: ", units)
That is because each tensor contains weights for four LSTM units (in that order):
i (input), f (forget), c (cell state) and o (output)
Therefore in order to extract weights you can simply use slice operator:
W = model.layers[0].get_weights()[0]
U = model.layers[0].get_weights()[1]
b = model.layers[0].get_weights()[2]
W_i = W[:, :units]
W_f = W[:, units: units * 2]
W_c = W[:, units * 2: units * 3]
W_o = W[:, units * 3:]
U_i = U[:, :units]
U_f = U[:, units: units * 2]
U_c = U[:, units * 2: units * 3]
U_o = U[:, units * 3:]
b_i = b[:units]
b_f = b[units: units * 2]
b_c = b[units * 2: units * 3]
b_o = b[units * 3:]
Source: keras code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With