I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this: <pre class="prettyprint"><code>model = Sequential() model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False)) model.add(Dense(feature_count)) model.add(Activation("linear")) </code></pre> The weights of the LSTM layer do have the following shapes: <pre class="prettyprint"><code>for weight in model.get_weights(): # weights from Dense layer omitted print(weight.shape) > (feature_count, hidden_neurons) > (hidden_neurons, hidden_neurons) > (hidden_neurons,) > (feature_count, hidden_neurons) > (hidden_neurons, hidden_neurons) > (hidden_neurons,) > (feature_count, hidden_neurons) > (hidden_neurons, hidden_neurons) > (hidden_neurons,) > (feature_count, hidden_neurons) > (hidden_neurons, hidden_neurons) > (hidden_neurons,) </code></pre> In short, it looks like there are four "elements" in this LSTM layer. I'm wondering now how to interpret them: <ul> <li>Where is the <code>time_steps</code> parameter in this representation? How does it influence the weights?</li> <li>I've read that a LSTM consists of several blocks, like an input and a forget gate. If those are represented in these weight matrices, which matrix belongs to which gate?</li> <li>Is there any way to see what the network has learned? For example, how much does it take from the last time step (<code>t-1</code> if we want to forecast <code>t</code>) and how much from <code>t-2</code> etc? It would be interesting to know if we could read from the weights that the input <code>t-5</code> is completely irrelevant, for example.</li> </ul> Clarifications and hints would be greatly appreciated.

If you are using Keras 2.2.0 When you print <blockquote> <pre class="prettyprint"><code>print(model.layers[0].trainable_weights) </code></pre> </blockquote> you should see three tensors: <code>lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0</code> One of the dimensions of each tensor should be a product of <blockquote> 4 * number_of_units </blockquote> where number_of_units is your number of neurons. Try: <pre class="prettyprint"><code>units = int(int(model.layers[0].trainable_weights[0].shape[1])/4) print("No units: ", units) </code></pre> That is because each tensor contains weights for four LSTM units (in that order): <blockquote> i (input), f (forget), c (cell state) and o (output) </blockquote> Therefore in order to extract weights you can simply use slice operator: <pre class="prettyprint"><code>W = model.layers[0].get_weights()[0] U = model.layers[0].get_weights()[1] b = model.layers[0].get_weights()[2] W_i = W[:, :units] W_f = W[:, units: units * 2] W_c = W[:, units * 2: units * 3] W_o = W[:, units * 3:] U_i = U[:, :units] U_f = U[:, units: units * 2] U_c = U[:, units * 2: units * 3] U_o = U[:, units * 3:] b_i = b[:units] b_f = b[units: units * 2] b_c = b[units * 2: units * 3] b_o = b[units * 3:] </code></pre> Source: keras code

How to interpret weights in a LSTM layer in Keras [closed]

I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this:

model = Sequential()  
model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False))  
model.add(Dense(feature_count))  
model.add(Activation("linear"))

The weights of the LSTM layer do have the following shapes:

for weight in model.get_weights(): # weights from Dense layer omitted
    print(weight.shape)

> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)

In short, it looks like there are four "elements" in this LSTM layer. I'm wondering now how to interpret them:

Where is the time_steps parameter in this representation? How does it influence the weights?
I've read that a LSTM consists of several blocks, like an input and a forget gate. If those are represented in these weight matrices, which matrix belongs to which gate?
Is there any way to see what the network has learned? For example, how much does it take from the last time step (t-1 if we want to forecast t) and how much from t-2 etc? It would be interesting to know if we could read from the weights that the input t-5 is completely irrelevant, for example.

Clarifications and hints would be greatly appreciated.

Does LSTM have weights?

In the LSTM figure, we can see that we have 8 different weight parameters (4 associated with the hidden state(cell state) and 4 associated with the input vector). We also have 4 different bias parameters. To better understand this we can use the following equations and better understand the operations in LSTM cell.

What is the hidden size in LSTM?

LSTM model architecture. In this model, the LSTM hidden state size is 3.

How do you determine the number of units in LSTM?

In general, there are no guidelines on how to determine the number of layers or the number of memory cells in an LSTM. The number of layers and cells required in an LSTM might depend on several aspects of the problem: The complexity of the dataset, such as the number of features, the number of data points, etc.

If you are using Keras 2.2.0

When you print

print(model.layers[0].trainable_weights)

you should see three tensors: lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0 One of the dimensions of each tensor should be a product of

4 * number_of_units

where number_of_units is your number of neurons. Try:

units = int(int(model.layers[0].trainable_weights[0].shape[1])/4)
print("No units: ", units)

That is because each tensor contains weights for four LSTM units (in that order):

i (input), f (forget), c (cell state) and o (output)

Therefore in order to extract weights you can simply use slice operator:

W = model.layers[0].get_weights()[0]
U = model.layers[0].get_weights()[1]
b = model.layers[0].get_weights()[2]

W_i = W[:, :units]
W_f = W[:, units: units * 2]
W_c = W[:, units * 2: units * 3]
W_o = W[:, units * 3:]

U_i = U[:, :units]
U_f = U[:, units: units * 2]
U_c = U[:, units * 2: units * 3]
U_o = U[:, units * 3:]

b_i = b[:units]
b_f = b[units: units * 2]
b_c = b[units * 2: units * 3]
b_o = b[units * 3:]

Source: keras code

How to interpret weights in a LSTM layer in Keras [closed]

Tags:

python

neural-network

keras

lstm

Isa

People also ask

1 Answers

Tomasz Bartkowiak

Recent Activity

Donate For Us

How to interpret weights in a LSTM layer in Keras [closed]

Tags:

python

neural-network

keras

lstm

Isa

People also ask

1 Answers

Tomasz Bartkowiak

Related questions

Recent Activity

Donate For Us