Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LSTM Initial state from Dense layer

I am using a lstm on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.

So a simple example would be:

feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)

series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)

model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

however, I am just not sure on how to specify the initial state on the list correctly. I get

ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)

which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?

like image 697
modesitt Avatar asked Jan 12 '18 20:01

modesitt


People also ask

What is the output of LSTM layer?

Given these inputs, the LSTM cell produces two outputs: a “true” output and a new hidden state.

What is recurrent initializer in LSTM?

recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal . bias_initializer: Initializer for the bias vector. Default: zeros .

What does dense mean in LSTM?

In any neural network, a dense layer is a layer that is deeply connected with its preceding layer which means the neurons of the layer are connected to every neuron of its preceding layer. This layer is the most commonly used layer in artificial neural network networks.

What is the default activation function in LSTM?

Activation function to use. Default: hyperbolic tangent ( tanh ). If you pass None , no activation is applied (ie. "linear" activation: a(x) = x ).


1 Answers

The first problem is that an LSTM(8) layer expects two initial states h_0 and c_0, each of dimension (None, 8). That's what it means by "cell.state_size is (8, 8)" in the error message.

If you only have one initial state dense_2, maybe you can switch to GRU (which requires only h_0). Or, you can transform your feature_input into two initial states.

The second problem is that h_0 and c_0 are of shape (batch_size, 8), but your dense_2 is of shape (batch_size, timesteps, 8). You need to deal with the time dimension before using dense_2 as initial states.

So maybe you can change your input shape into (data.training_features.shape[1],) or take average over timesteps with GlobalAveragePooling1D.

A working example would be:

feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)

series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
like image 56
Yu-Yang Avatar answered Nov 05 '22 08:11

Yu-Yang