I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures.
VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead).
Here is the sample of the dataset:
DateIdx VAR dewpt hum press temp
2017-04-17 00:00:00 0.369397 0.155039 0.386792 0.196721 0.238889
2017-04-17 00:15:00 0.363214 0.147287 0.429245 0.196721 0.233333
2017-04-17 00:30:00 0.357032 0.139535 0.471698 0.196721 0.227778
2017-04-17 00:45:00 0.323029 0.127907 0.429245 0.204918 0.219444
2017-04-17 01:00:00 0.347759 0.116279 0.386792 0.213115 0.211111
2017-04-17 01:15:00 0.346213 0.127907 0.476415 0.204918 0.169444
2017-04-17 01:30:00 0.259660 0.139535 0.566038 0.196721 0.127778
2017-04-17 01:45:00 0.205564 0.073643 0.523585 0.172131 0.091667
2017-04-17 02:00:00 0.157650 0.007752 0.481132 0.147541 0.055556
2017-04-17 02:15:00 0.122101 0.003876 0.476415 0.122951 0.091667
Input dataset plot
I have decided to use LSTM in Keras. Having data from the whole year, I have used data from past 329 days as a training data and the rest for a validation during the training. train_X -> contains whole measures including VAR from 329 days train_Y -> contains only VAR from 329 days. The value is shifted one step ahead. The rest timesteps goes to test_X and test_Y.
Here is the code I prepare train_X and train_Y:
#X -> is the whole dataframe
#Y -> is a vector of VAR from whole dataframe, already shifted 1 step ahead
#329 * 96 = 31584
train_X = X[:31584]
train_X = train_X.reshape(train_X.shape[0],1,5)
train_Y = Y[:31584]
train_Y = train_Y.reshape(train_Y.shape[0],1)
To predict next VAR value I would like to use past 672 timesteps (whole week measures). For this reason I have set batch_size=672
, so that the ‘fit’ command look like this:
history = model.fit(train_X, train_Y, epochs=50, batch_size=672, validation_data=(test_X, test_Y), shuffle=False)
Here is the architecture of my network:
model = models.Sequential()
model.add(layers.LSTM(672, input_shape=(None, 672), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(336, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(168, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(84, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(21, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mae', optimizer='adam')
model.summary()
From the plot below we can see that the network has learn ‘something’ after 50 epochs:
Plot from the learning process
For the prediction purpose I have prepared a set of data containing last 672 steps with all values and 96 without VAR value – which should be predicted. I also used autoregression, so I updated VAR after each prediction and used it for next prediction.
The predX dataset (used for prediction) looks like this:
print(predX['VAR'][668:677])
DateIdx VAR
2017-04-23 23:00:00 0.307573
2017-04-23 23:15:00 0.278207
2017-04-23 23:30:00 0.284390
2017-04-23 23:45:00 0.309118
2017-04-24 00:00:00 NaN
2017-04-24 00:15:00 NaN
2017-04-24 00:30:00 NaN
2017-04-24 00:45:00 NaN
2017-04-24 01:00:00 NaN
Name: VAR, dtype: float64
Here is the code (autoregression) I have used to predict next 96 steps:
stepsAhead = 96
historySteps = 672
for i in range(0,stepsAhead):
j = i + historySteps
ypred = model.predict(predX.values[i:j].reshape(1,historySteps,5))
predX['VAR'][j] = ypred
Unfortunately the results are very poor and very far from the expectations:
A Plot with predicted data
Results combined with a previous day:
Predicted data combined with a previous day
Except from the ‘What have I done wrong‘ question, I would like to ask a few questions:
Q1. During model fifing, I have just put the whole history in batches of 672 size. Is it correct? How should I organize the dataset for the model fitting? What options do I have? Should I use the “sliding window” approach (like in the link here: https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/ )?
Q2. Is the 50 epochs enough? What is the common practice here? Maybe the network is underfitted resulting in poor prediction? So far I tried 200 epoch with the same result.
Q3. Should I try a different architecture? Is the proposed network ‘big enough’ to handle such a data? Maybe a “stateful” network is the right approach here?
Q4. Did I implement the autoregression correctly? Is there any other approach to make a prediction for many steps ahead e.g. 192 or 672 like in my case?
Long Short-Term Memory (LSTM) has proven to be a reliable model for forecasting the incidence of diseases [22, 23]. The unique forget gate structure may make the model perform well in the long-term prediction. Using a multivariate multistep LSTM model may obtain better prediction results.
“The LSTM cell adds long-term memory in an even more performant way because it allows even more parameters to be learned. This makes it the most powerful [Recurrent Neural Network] to do forecasting, especially when you have a longer-term trend in your data.
Prophet's advantage is that it requires less hyperparameter tuning as it is specifically designed to detect patterns in business time series. LSTM-based recurrent neural networks are probably the most powerful approach to learning from sequential data and time series are only a special case.
It was shown from the forecasting performance in 2019 that ARIMA was better than LSTM for monthly and weekly forecasting while the LSTM was better than ARIMA for daily forecasting in rolling forecasting models.
It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:
(total_samples, 5)
you can use the TimeseriesGenerator to create a sliding window what will generate (batch_size, past_timesteps, 5)
for you. In this case, you will use .fit_generator
to train the network.n
number of predictions after training.The single point prediction model could look like:
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))
1) Batches are not the sequences. The input X
is the sequence. The input should have the shape [None, sequence_length, number_of_features]
. The 1st axis will be filled in by Keras with the batches. But they are not the sequences. The sequences are on the 2nd axis. The 3rd axis are the feature columns. Batch size 672
might be too large. You can try smaller values 128
, 64
, or 32
.
2) Almost certain your network overfits. The network has too many LSTM layers. I would try just 2 layers of LSTM as @nuric suggested and see how it performs.
3) There also seems a confusion about the LSTM units (or LSTM size). It does not have to be 672
. In fact, 672
is too large. A good starting point is 128
.
4) The NN architecture is predicting a single value of VAR
. In that case, make sure your Y
have a single value for each sequence of X
.
5) Alternatively you can make the last LSTM to output a sequence. In that case, each Y
entry is a VAR
sequence shifted one-step ahead. Going back to 4), make sure Y
has the correct shape corresponding with that of X
and the NN architecture.
6) You plot shows 50 epochs are enough for converging. Once you adjust X
, Y
, and the NN, do the same thing for watching the number of epochs.
7) Lastly an idea about the dates. If you want to include the dates in X
, one idea is to one-hot encode them into week days. So your X
would be [dewpt, hum, press, temp, MON, TUE, ..., SAT, SUN]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With