When using a Keras LSTM to predict on time series data I've been getting errors when I'm trying to train the model using a batch size of 50, while then trying to predict on the same model using a batch size of 1 (ie just predicting the next value). Why am I not able to train and fit the model with multiple batches at once, and then use that model to predict for anything other than the same batch size. It doesn't seem to make sense, but then I could easily be missing something about this. Edit: this is the model. <code>batch_size</code> is 50, <code>sl</code> is sequence length, which is set at 20 currently. <pre class="prettyprint"><code> model = Sequential() model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True)) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam') model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2) </code></pre> here is the line for predicting on the training set for RMSE <pre class="prettyprint"><code> # make predictions trainPredict = model.predict(trainX, batch_size=batch_size) </code></pre> here is the actual prediction of unseen time steps <pre class="prettyprint"><code>for i in range(test_len): print('Prediction %s: ' % str(pred_count)) next_pred_res = np.reshape(next_pred, (next_pred.shape[1], 1, next_pred.shape[0])) # make predictions forecastPredict = model.predict(next_pred_res, batch_size=1) forecastPredictInv = scaler.inverse_transform(forecastPredict) forecasts.append(forecastPredictInv) next_pred = next_pred[1:] next_pred = np.concatenate([next_pred, forecastPredict]) pred_count += 1 </code></pre> This issue is with the line: <code>forecastPredict = model.predict(next_pred_res, batch_size=batch_size)</code> The error when batch_size here is set to 1 is: <code>ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)'</code> which is the same error that throws when <code>batch_size</code> here is set to 50 like the other batch sizes as well. The total error is: <pre class="prettyprint"><code> forecastPredict = model.predict(next_pred_res, batch_size=1) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/models.py", line 899, in predict return self.model.predict(x, batch_size=batch_size, verbose=verbose) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1573, in predict batch_size=batch_size, verbose=verbose) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1203, in _predict_loop batch_outs = f(ins_batch) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2103, in __call__ feed_dict=feed_dict) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr) File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 944, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)' </code></pre> Edit: Once I set the model to <code>stateful=False</code> then I am able to use different batch sizes for fitting/training and prediction. What is the reason for this?

Sadly what you wish for is impossible because you specify the batch_size when you define the model... However, I found a simple way around this problem: create 2 models! The first is used for training and the second for predictions, and have them share weights: <pre class="prettyprint"><code>train_model = Sequential([Input(batch_input_shape=(batch_size,...), <continue specifying your model>]) predict_model = Sequential([Input(batch_input_shape=(1,...), <continue specifying exact same model>]) train_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam()) predict_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam()) </code></pre> Now you can use any batch size you want. after you fit your train_model just save it's weights and load them with the predict_model: <pre class="prettyprint"><code>train_model.save_weights('lstm_model.h5') predict_model.load_weights('lstm_model.h5') </code></pre> notice that you only want to save and load the weights, and not the whole model (which includes the architecture, optimizer etc...). This way you get the weights but you can input one batch at a time... more on keras save/load models: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model notice that you need to install h5py to use "save weights".

Why does Keras LSTM batch size used for prediction have to be the same as fitting batch size?

Tags:

deep-learning

keras

lstm

When using a Keras LSTM to predict on time series data I've been getting errors when I'm trying to train the model using a batch size of 50, while then trying to predict on the same model using a batch size of 1 (ie just predicting the next value).

Why am I not able to train and fit the model with multiple batches at once, and then use that model to predict for anything other than the same batch size. It doesn't seem to make sense, but then I could easily be missing something about this.

Edit: this is the model. batch_size is 50, sl is sequence length, which is set at 20 currently.

    model = Sequential()     model.add(LSTM(1, batch_input_shape=(batch_size, 1, sl), stateful=True))     model.add(Dense(1))     model.compile(loss='mean_squared_error', optimizer='adam')     model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, verbose=2)

here is the line for predicting on the training set for RMSE

    # make predictions     trainPredict = model.predict(trainX, batch_size=batch_size)

here is the actual prediction of unseen time steps

for i in range(test_len):     print('Prediction %s: ' % str(pred_count))      next_pred_res = np.reshape(next_pred, (next_pred.shape[1], 1, next_pred.shape[0]))     # make predictions     forecastPredict = model.predict(next_pred_res, batch_size=1)     forecastPredictInv = scaler.inverse_transform(forecastPredict)     forecasts.append(forecastPredictInv)     next_pred = next_pred[1:]     next_pred = np.concatenate([next_pred, forecastPredict])      pred_count += 1

This issue is with the line:

forecastPredict = model.predict(next_pred_res, batch_size=batch_size)

The error when batch_size here is set to 1 is:

ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)' which is the same error that throws when batch_size here is set to 50 like the other batch sizes as well.

The total error is:

    forecastPredict = model.predict(next_pred_res, batch_size=1)   File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/models.py", line 899, in predict     return self.model.predict(x, batch_size=batch_size, verbose=verbose)   File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1573, in predict     batch_size=batch_size, verbose=verbose)    File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/engine/training.py", line 1203, in _predict_loop     batch_outs = f(ins_batch)   File "/home/entelechy/tf_keras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2103, in __call__     feed_dict=feed_dict)   File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run     run_metadata_ptr)   File "/home/entelechy/tf_keras/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 944, in _run     % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'lstm_1_input:0', which has shape '(10, 1, 2)'

Edit: Once I set the model to stateful=False then I am able to use different batch sizes for fitting/training and prediction. What is the reason for this?

895

asked Apr 30 '17 02:04

DanielSon

2 Answers

Unfortunately what you want to do is impossible with Keras ... I've also struggle a lot of time on this problems and the only way is to dive into the rabbit hole and work with Tensorflow directly to do LSTM rolling prediction.

First, to be clear on terminology, batch_size usually means number of sequences that are trained together, and num_steps means how many time steps are trained together. When you mean batch_size=1 and "just predicting the next value", I think you meant to predict with num_steps=1.

Otherwise, it should be possible to train and predict with batch_size=50 meaning you are training on 50 sequences and make 50 predictions every time step, one for each sequence (meaning training/prediction num_steps=1).

However, I think what you mean is that you want to use stateful LSTM to train with num_steps=50 and do prediction with num_steps=1. Theoretically this make senses and should be possible, and it is possible with Tensorflow, just not Keras.

The problem: Keras requires an explicit batch size for stateful RNN. You must specify batch_input_shape (batch_size, num_steps, features).

The reason: Keras must allocate a fixed-size hidden state vector in the computation graph with shape (batch_size, num_units) in order to persist the values between training batches. On the other hand, when stateful=False, the hidden state vector can be initialized dynamically with zeroes at the beginning of each batch so it does not need to be a fixed size. More details here: http://philipperemy.github.io/keras-stateful-lstm/

Possible work around: Train and predict with num_steps=1. Example: https://github.com/keras-team/keras/blob/master/examples/lstm_stateful.py. This might or might not work at all for your problem as the gradient for back propagation will be computed on only one time step. See: https://github.com/fchollet/keras/issues/3669

My solution: use Tensorflow: In Tensorflow you can train with batch_size=50, num_steps=100, then do predictions with batch_size=1, num_steps=1. This is possible by creating a different model graph for training and prediction sharing the same RNN weight matrices. See this example for next-character prediction: https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py#L11 and blog post http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Note that one graph can still only work with one specified batch_size, but you can setup multiple model graphs sharing weights in Tensorflow.

103

answered Sep 27 '22 16:09

Hai-Anh Trinh

Sadly what you wish for is impossible because you specify the batch_size when you define the model... However, I found a simple way around this problem: create 2 models! The first is used for training and the second for predictions, and have them share weights:

train_model = Sequential([Input(batch_input_shape=(batch_size,...), <continue specifying your model>])  predict_model = Sequential([Input(batch_input_shape=(1,...), <continue specifying exact same model>])  train_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam()) predict_model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())

Now you can use any batch size you want. after you fit your train_model just save it's weights and load them with the predict_model:

train_model.save_weights('lstm_model.h5') predict_model.load_weights('lstm_model.h5')

notice that you only want to save and load the weights, and not the whole model (which includes the architecture, optimizer etc...). This way you get the weights but you can input one batch at a time... more on keras save/load models: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

notice that you need to install h5py to use "save weights".

answered Sep 27 '22 18:09

Oren Matar

Related questions
                            
                                How to implement a deep bidirectional LSTM with Keras?
                            
                                PyTorch - How to get learning rate during training?
                            
                                What is a `"Python"` layer in caffe?
                            
                                Running the Tensorflow 2.0 code gives 'ValueError: tf.function-decorated function tried to create variables on non-first call'. What am I doing wrong?
                            
                                keras: what is the difference between model.predict and model.predict_proba
                            
                                Data Augmentation Image Data Generator Keras Semantic Segmentation
                            
                                Pytorch. How does pin_memory work in Dataloader?
                            
                                How to display custom images in TensorBoard using Keras?
                            
                                Choosing number of Steps per Epoch
                            
                                What is the difference between the predict and predict_on_batch methods of a Keras model?
                            
                                What is the purpose of the ROI layer in a Fast R-CNN?
                            
                                Data augmentation in test/validation set?
                            
                                Caffe didn't see hdf5.h when compiling
                            
                                Is there any way to get variable importance with Keras?
                            
                                How does Pytorch's "Fold" and "Unfold" work?
                            
                                Tensorflow Confusion Matrix in TensorBoard
                            
                                Training on imbalanced data using TensorFlow
                            
                                Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization
                            
                                Keras error : Expected to see 1 array
                            
                                Tensorflow: loss decreasing, but accuracy stable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With