Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need a stateful or stateless LSTM?

I am trying to make an LSTM for time series prediction in Keras. In particular, it should predict unseen values once the model is trained. A visualisation of the time series is shown below. Data

The model is trained on the blue time series, and predictions are compared to the orange time series.

For predicting, I want to take the last n points of the training data (where n is the sequence length), run a prediction, and use this prediction for a consecutive (second) prediction, ie:

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

I have tried to get this to work, but so far without success. I am at a loss if I should use a stateful or stateless model, and what a good value for the sequence length could be. Does anyone have experience with this?

I have read and tried various tutorials, but none seen to applicable to my kind of data.

Because I want to run consecutive predictions, I would need a stateful model to prevent keras resetting states after each call to model.predict, but training with a batch size of 1 takes forever... Or is there a way to circumvent this problem?

like image 339
jorism1993 Avatar asked Aug 26 '18 14:08

jorism1993


1 Answers

Stateful LSTM is used when the whole sequence plays a part in forming the output. Taking an extreme case; you might have 1000-length sequence, and the very first character of that sequence is what actually defines the output:

Stateful If you were to batch this into 10 x 100 length sequences, then with stateful LSTM the connections (state) between sequences in the batch would be retained and it would (with enough examples) learn the relationship of the first character plays significant importance to the output. In effect, sequence length is immaterial because the network's state is persisted across the whole stretch of data, you simply batch it as a means of supplying the data.

Stateless During training, the state is reset after each sequence. So in the example I've given, the network wouldn't learn that it's the first character of the 1000-length sequences that defines the output, because it would never see the long-term dependency because the first character and the final output value are in separate sequences, and the state isn't retained between the sequences.

Summary What you need to determine is whether there is likely to be dependency of data at the end of your time-series being affected by what potentially happened right at the start.

I would say that it's actually quite rare that there are such long-term dependencies like that, and what you're probably better doing is using a stateless LSTM, but setting sequence length as a hyperparameter to find which sequences length best models the data, i.e. provides the most accurate validation data.

like image 65
BigBadMe Avatar answered Oct 23 '22 15:10

BigBadMe