I am using a lstm
on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.
So a simple example would be:
feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)
series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
however, I am just not sure on how to specify the initial state on the list correctly. I get
ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)
which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?
Given these inputs, the LSTM cell produces two outputs: a “true” output and a new hidden state.
recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal . bias_initializer: Initializer for the bias vector. Default: zeros .
In any neural network, a dense layer is a layer that is deeply connected with its preceding layer which means the neurons of the layer are connected to every neuron of its preceding layer. This layer is the most commonly used layer in artificial neural network networks.
Activation function to use. Default: hyperbolic tangent ( tanh ). If you pass None , no activation is applied (ie. "linear" activation: a(x) = x ).
The first problem is that an LSTM(8)
layer expects two initial states h_0
and c_0
, each of dimension (None, 8)
. That's what it means by "cell.state_size
is (8, 8)" in the error message.
If you only have one initial state dense_2
, maybe you can switch to GRU
(which requires only h_0
). Or, you can transform your feature_input
into two initial states.
The second problem is that h_0
and c_0
are of shape (batch_size, 8)
, but your dense_2
is of shape (batch_size, timesteps, 8)
. You need to deal with the time dimension before using dense_2
as initial states.
So maybe you can change your input shape into (data.training_features.shape[1],)
or take average over timesteps with GlobalAveragePooling1D
.
A working example would be:
feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)
series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With