What's the difference between input_shape and batch_input_shape in LSTM

Tags:

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

On a simple example, I couldn't observe any difference between:

model = Sequential()
model.add(LSTM(1, batch_input_shape=(None,5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

and

model = Sequential()
model.add(LSTM(1, input_shape=(5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

However when I set the batch size to 12 batch_input_shape=(12,5,1) and used batch_size=10 when fitting the model, I got an error.

ValueError: Cannot feed value of shape (10, 5, 1) for Tensor 'lstm_96_input:0', which has shape '(12, 5, 1)'

Which obviously makes sense. However I can see no point in restricting the batch size on model level.

Am I missing something?

490

asked Mar 20 '18 00:03

Andrzej Gis

1 Answers

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

Yes, they are practically equivalent, your experiments confirm it, see also this discussion.

However I can see no point in restricting the batch size on model level.

Batch size restriction is sometimes necessary, the example that comes to my mind is a stateful LSTM, in which the last cell state in a batch is remembered and used for initialization for subsequent batches. This ensures the client won't feed different batch sizes into the network. Example code:

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))

158

answered Sep 22 '22 08:09

Maxim

Related questions
                            
                                Visualizing a decision tree ( example from scikit-learn )
                            
                                Retrieving the optimal number of clusters in R
                            
                                Uniformly shuffle 5 gigabytes of numpy data
                            
                                Neural network backprop not fully training
                            
                                PyTorch : predict single example
                            
                                Trying to write my own Neural Network in Python
                            
                                In scikit-learn, can DBSCAN use sparse matrix?
                            
                                Fast Information Gain computation
                            
                                TensorFlow: Adding Class to Pre-trained Inception Model & Outputting Full Image Hierarchy
                            
                                Reason of having high AUC and low accuracy in a balanced dataset
                            
                                How to use cross_val_score with random_state
                            
                                Uniformly partition PySpark Dataframe by count of non-null elements in row
                            
                                Will jpeg compression affect training and classification using Convolutional Neural Networks
                            
                                Keras conditional passing one model output to another model
                            
                                Python text processing: NLTK and pandas
                            
                                Why does shuffling my validation set in Keras change my model's performance?
                            
                                How to get classification probabilities from PySpark MultilayerPerceptronClassifier?
                            
                                Pandas - KeyError: '[] not in index' when training a Keras model
                            
                                Neural Network (No hidden layers) vs Logistic Regression?
                            
                                Error "Expected 2D array, got 1D array instead" Using OneHotEncoder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between input_shape and batch_input_shape in LSTM

Tags:

machine-learning

deep-learning

keras

lstm

rnn

Andrzej Gis

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us