Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras LSTM: a time-series multi-step multi-features forecasting - poor results

I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures.

VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead).

Here is the sample of the dataset:

DateIdx               VAR       dewpt       hum         press       temp
2017-04-17 00:00:00   0.369397  0.155039    0.386792    0.196721    0.238889
2017-04-17 00:15:00   0.363214  0.147287    0.429245    0.196721    0.233333
2017-04-17 00:30:00   0.357032  0.139535    0.471698    0.196721    0.227778
2017-04-17 00:45:00   0.323029  0.127907    0.429245    0.204918    0.219444
2017-04-17 01:00:00   0.347759  0.116279    0.386792    0.213115    0.211111
2017-04-17 01:15:00   0.346213  0.127907    0.476415    0.204918    0.169444
2017-04-17 01:30:00   0.259660  0.139535    0.566038    0.196721    0.127778
2017-04-17 01:45:00   0.205564  0.073643    0.523585    0.172131    0.091667
2017-04-17 02:00:00   0.157650  0.007752    0.481132    0.147541    0.055556
2017-04-17 02:15:00   0.122101  0.003876    0.476415    0.122951    0.091667

Input dataset plot

I have decided to use LSTM in Keras. Having data from the whole year, I have used data from past 329 days as a training data and the rest for a validation during the training. train_X -> contains whole measures including VAR from 329 days train_Y -> contains only VAR from 329 days. The value is shifted one step ahead. The rest timesteps goes to test_X and test_Y.

Here is the code I prepare train_X and train_Y:

#X -> is the whole dataframe
#Y -> is a vector of VAR from whole dataframe, already shifted 1 step ahead

#329 * 96 = 31584
train_X = X[:31584]
train_X = train_X.reshape(train_X.shape[0],1,5)
train_Y = Y[:31584]  
train_Y = train_Y.reshape(train_Y.shape[0],1)

To predict next VAR value I would like to use past 672 timesteps (whole week measures). For this reason I have set batch_size=672, so that the ‘fit’ command look like this:

history = model.fit(train_X, train_Y, epochs=50, batch_size=672, validation_data=(test_X, test_Y), shuffle=False)  

Here is the architecture of my network:

model = models.Sequential()
model.add(layers.LSTM(672, input_shape=(None, 672), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(336, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(168, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(84, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(21, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mae', optimizer='adam')
model.summary()

From the plot below we can see that the network has learn ‘something’ after 50 epochs:

Plot from the learning process

For the prediction purpose I have prepared a set of data containing last 672 steps with all values and 96 without VAR value – which should be predicted. I also used autoregression, so I updated VAR after each prediction and used it for next prediction.

The predX dataset (used for prediction) looks like this:

print(predX['VAR'][668:677])

DateIdx            VAR
2017-04-23 23:00:00    0.307573
2017-04-23 23:15:00    0.278207
2017-04-23 23:30:00    0.284390
2017-04-23 23:45:00    0.309118
2017-04-24 00:00:00         NaN
2017-04-24 00:15:00         NaN
2017-04-24 00:30:00         NaN
2017-04-24 00:45:00         NaN
2017-04-24 01:00:00         NaN
Name: VAR, dtype: float64

Here is the code (autoregression) I have used to predict next 96 steps:

stepsAhead = 96
historySteps = 672

for i in range(0,stepsAhead):
    j = i + historySteps
    ypred = model.predict(predX.values[i:j].reshape(1,historySteps,5))
    predX['VAR'][j] = ypred

Unfortunately the results are very poor and very far from the expectations:

A Plot with predicted data

Results combined with a previous day:

Predicted data combined with a previous day

Except from the ‘What have I done wrong‘ question, I would like to ask a few questions:

Q1. During model fifing, I have just put the whole history in batches of 672 size. Is it correct? How should I organize the dataset for the model fitting? What options do I have? Should I use the “sliding window” approach (like in the link here: https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/ )?

Q2. Is the 50 epochs enough? What is the common practice here? Maybe the network is underfitted resulting in poor prediction? So far I tried 200 epoch with the same result.

Q3. Should I try a different architecture? Is the proposed network ‘big enough’ to handle such a data? Maybe a “stateful” network is the right approach here?

Q4. Did I implement the autoregression correctly? Is there any other approach to make a prediction for many steps ahead e.g. 192 or 672 like in my case?

like image 622
Bolesław Maliszewski Avatar asked Jun 07 '18 13:06

Bolesław Maliszewski


People also ask

Can LSTM be used for multivariate forecasting?

Long Short-Term Memory (LSTM) has proven to be a reliable model for forecasting the incidence of diseases [22, 23]. The unique forget gate structure may make the model perform well in the long-term prediction. Using a multivariate multistep LSTM model may obtain better prediction results.

Is LSTM Good for forecasting?

“The LSTM cell adds long-term memory in an even more performant way because it allows even more parameters to be learned. This makes it the most powerful [Recurrent Neural Network] to do forecasting, especially when you have a longer-term trend in your data.

Is LSTM better than prophet?

Prophet's advantage is that it requires less hyperparameter tuning as it is specifically designed to detect patterns in business time series. LSTM-based recurrent neural networks are probably the most powerful approach to learning from sequential data and time series are only a special case.

Is LSTM better than Arima?

It was shown from the forecasting performance in 2019 that ARIMA was better than LSTM for monthly and weekly forecasting while the LSTM was better than ARIMA for daily forecasting in rolling forecasting models.


2 Answers

It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:

  1. Once you have a 2D dataset (total_samples, 5) you can use the TimeseriesGenerator to create a sliding window what will generate (batch_size, past_timesteps, 5) for you. In this case, you will use .fit_generator to train the network.
  2. If you get the same result, 50 epochs should be fine. You usually adjust based on the performance of your network. But you should keep it fixed if you are comparing two different network architectures.
  3. Architecture is really large as you aim to predict all 672 future values at once. You can design the network so it learns to predict one measurement at a time. At prediction time you can predict one point and feed that again to predict the next until you get 672.
  4. This ties into answer 3, you can learn to predict one at a time and chain the predictions to n number of predictions after training.

The single point prediction model could look like:

model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))
like image 152
nuric Avatar answered Nov 16 '22 04:11

nuric


1) Batches are not the sequences. The input X is the sequence. The input should have the shape [None, sequence_length, number_of_features]. The 1st axis will be filled in by Keras with the batches. But they are not the sequences. The sequences are on the 2nd axis. The 3rd axis are the feature columns. Batch size 672 might be too large. You can try smaller values 128, 64, or 32.

2) Almost certain your network overfits. The network has too many LSTM layers. I would try just 2 layers of LSTM as @nuric suggested and see how it performs.

3) There also seems a confusion about the LSTM units (or LSTM size). It does not have to be 672. In fact, 672 is too large. A good starting point is 128.

4) The NN architecture is predicting a single value of VAR. In that case, make sure your Y have a single value for each sequence of X.

5) Alternatively you can make the last LSTM to output a sequence. In that case, each Y entry is a VAR sequence shifted one-step ahead. Going back to 4), make sure Y has the correct shape corresponding with that of X and the NN architecture.

6) You plot shows 50 epochs are enough for converging. Once you adjust X, Y, and the NN, do the same thing for watching the number of epochs.

7) Lastly an idea about the dates. If you want to include the dates in X, one idea is to one-hot encode them into week days. So your X would be [dewpt, hum, press, temp, MON, TUE, ..., SAT, SUN].

like image 35
neurite Avatar answered Nov 16 '22 03:11

neurite