I am trying to do multi-step time series forecasting using multivariate LSTM in Keras. Specifically, I have two variables (var1 and var2) for each time step originally. Having followed the online tutorial here, I decided to use data at time (t-2) and (t-1) to predict the value of var2 at time step t. As sample data table shows, I am using the first 4 columns as input, Y as output. The code I have developed can be seen here, but I have got three questions.
var1(t-2) var2(t-2) var1(t-1) var2(t-1) var2(t)
2 1.5 -0.8 0.9 -0.5 -0.2
3 0.9 -0.5 -0.1 -0.2 0.2
4 -0.1 -0.2 -0.3 0.2 0.4
5 -0.3 0.2 -0.7 0.4 0.6
6 -0.7 0.4 0.2 0.6 0.7
Update: LSTM result (blue line is the training seq, orange line is the ground truth, green is the prediction)
In this blog post we'd like to show how Long Short Term Memories (LSTM) based RNNs can be used for multivariate time series forecasting by way of a bike sharing case study where we predict the demand for bikes based on multiple input features.
The multi-step ARIMA–LSTM model is proposed for the first time to examine the performance of each model in the short, medium, and long term, respectively. Compared with the ARIMA model, each error of the multivariate 2-step LSTM model is reduced by 12.92%, 15.94%, 15.97%, and 14.81% in the short term.
LSTM stands for Long short-term memory. LSTM cells are used in recurrent neural networks that learn to predict the future from sequences of variable lengths. Note that recurrent neural networks work with any kind of sequential data and, unlike ARIMA and Prophet, are not restricted to time series.
From your table, I see you have a sliding window over a single sequence, making many smaller sequences with 2 steps.
If you're not using the table: see question 3
Assuming you're using that table as input, where it's clearly a sliding window case taking two time steps as input, your timeSteps
is 2.
You should probably work as if var1
and var2
were features in the same sequence:
input_shape = (2,2)
- Two time steps and two features/vars.We do not need to make tables like that or build a sliding window case. That is one possible approach.
Your model is actually capable of learning things and deciding the size of this window itself.
If on one hand your model is capable of learning long time dependencies, allowing you not to use windows, on the other hand, it may learn to identify different behaviors at the beginning and at the middle of a sequence. In this case, if you want to predict using sequences that start from the middle (not including the beginning), your model may work as if it were the beginning and predict a different behavior. Using windows eliminate this very long influence. Which is better may depend on testing, I guess.
Not using windows:
If your data has 800 steps, feed all the 800 steps at once for training.
Here, we will need to separate two models, one for training, another for predicting. In training, we will take advantage of the parameter return_sequences=True
. This means that for each input step, we will get an output step.
For predicting later, we will want only one output, then we will use return_sequences= False
. And in case we are going to use the predicted outputs as inputs for following steps, we are going to use a stateful=True
layer.
Training:
Have your input data shaped as (1, 799, 2)
, 1 sequence, taking the steps from 1 to 799. Both vars in the same sequence (2 features).
Have your target data (Y) shaped also as (1, 799, 2)
, taking the same steps shifted, from 2 to 800.
Build a model with return_sequences=True
. You may use timeSteps=799
, but you may also use None
(allowing variable amount of steps).
model.add(LSTM(units, input_shape=(None,2), return_sequences=True))
model.add(LSTM(2, return_sequences=True)) #it could be a Dense 2 too....
....
model.fit(X, Y, ....)
Predicting:
For predicting, create a similar model, now with return_sequences=False
.
Copy the weights:
newModel.set_weights(model.get_weights())
You can make an input with length 800, for instance (shape: (1,800,2)
) and predict just the next step:
step801 = newModel.predict(X)
If you want to predict more, we are going to use the stateful=True
layers. Use the same model again, now with return_sequences=False
(only in the last LSTM, the others keep True) and stateful=True
(all of them). Change the input_shape
by batch_input_shape=(1,None,2)
.
#with stateful=True, your model will never think that the sequence ended
#each new batch will be seen as new steps instead of new sequences
#because of this, we need to call this when we want a sequence starting from zero:
statefulModel.reset_states()
#predicting
X = steps1to800 #input
step801 = statefulModel.predict(X).reshape(1,1,2)
step802 = statefulModel.predict(step801).reshape(1,1,2)
step803 = statefulModel.predict(step802).reshape(1,1,2)
#the reshape is because return_sequences=True eliminates the step dimension
Actually, you could do everything with a single stateful=True
and return_sequences=True
model, taking care of two things:
reset_states()
for every epoch. (Train with a manual loop and epochs=1
)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With