I'm trying to implement a simple LSTM prediction model in keras for timeseries. I have 10 timeseries with a lookback_window=28 and number of features is 1. I need to predict the next value (timesteps=28, n_features=1). Here is my model and the way I tried to train it:
model = Sequential()
model.add(LSTM(28, batch_input_shape=(49,28,1), stateful=True, return_sequences=True))
model.add(LSTM(14, stateful=True))
model.add(Dense(1, activation='relu'))
earlyStopping = callbacks.EarlyStopping(monitor='val_loss', patience=100, verbose=1, mode='auto')
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(train_data, train_y,
epochs=1000,
callbacks=[earlyStopping],
batch_size=49,
validation_data=(validation_data, validation_y),
verbose=1,
shuffle=False)
prediction_result = model.predict(test_data, batch_size=49)
I'm not reseting the states after an epoch nor using shuffling because the order in the timeseries is important and there is a connection between them. The problem is the loss value sometimes changes slightly only after the first epoch and then it remains constant and doesn't change at all, most of the time it doesn't change at all . I tried to use a different optimization like RMSprop
, changed it's learning rate, removing the earlystope to let it train longer, changing batch_size and even traied without batch, tried the same model stateless, set shuffle=True
, added more layers and made it deeper, ... but none of them made any difference! I wonder what am I doing wrong! Any suggestion?!
P.S. My data consists of 10 timeseries and each timeseries has 567 length:
timeseries#1: 451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, ....
timeseries#2: 304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....
...
timeseries#10: 208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....
My lookback windeow is 28. So I generated the following sequences with 28 timesteps:
[451, 318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, .... ]
[318, 404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, ....]
[404, 199, 225, 158, 357, 298, 339, 155, 135, 239, 306, 56, 890, ....]
...
[304, 274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, ....]
[274, 150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, ....]
[150, 143, 391, 357, 278, 557, 98, 106, 305, 288, 325, 127, 798, ....]
...
[208, 138, 201, 342, 280, 282, 280, 140, 124, 261, 193, .....]
[138, 201, 342, 280, 282, 280, 140, 124, 261, 193, 854, .....]
Then, I'm splitting my data as follow (data.shape=(5390,28,1) is 5390 for 10 timeseies):
num_training_ts = int(data.shape[0] / 539 * (1 - config['validation_split_ratio']))
train_size = num_training_ts * 539
train_data = data[:train_size, :, :]
train_y = y[:train_size]
validation_data = data[train_size:-1*539, :, :]
validation_y = y[train_size:-1*539]
test_data = data[-1*539:, :, :] # The last timeseries
test_y = y[-1*539:]
I scaled the data between -1 and 1 using minMaxScale, but here for simplicity I'm using the actual values. At the end I have the following:
train_data.shape=(3234,28,1)
train_y.shape=(3234,)
test_data.shape=(539,28,1)
test_y.shape=(539,)
validation_data.shape=(1617,28,1)
validation_y.shape=(1617,)
When I find this kind of issues first I focus on data: My data are scaled? Do I have enough data for this model?
Then I pass to the model. In your case it seems that all the learn is done in the first iteration. So why don't you try to change the learning rate and the decay of your optimizer?
With keras it's so easy. First define your optimizer (in your code I see you used 'Adam'):
my_adam_optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
then use it in the complie function:
model.compile(loss='mean_squared_error', optimizer=my_adam_compiler)
UPDATE:
The last relu layer 'cuts' the negative values, so if your target contains negatives it's not able to predict them. Somewhere in the topic you said you used the minmaxScaler between -1 and 1, and for sure it gives you problem. By removing the activation parameter you use the defalut, which I think is 'linear'.
Removing the relu
activation from the last layer can fix the problem!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With