Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?
On a simple example, I couldn't observe any difference between:
model = Sequential()
model.add(LSTM(1, batch_input_shape=(None,5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))
and
model = Sequential()
model.add(LSTM(1, input_shape=(5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))
However when I set the batch size to 12 batch_input_shape=(12,5,1)
and used batch_size=10
when fitting the model, I got an error.
ValueError: Cannot feed value of shape (10, 5, 1) for Tensor 'lstm_96_input:0', which has shape '(12, 5, 1)'
Which obviously makes sense. However I can see no point in restricting the batch size on model level.
Am I missing something?
The input of LSTM layer has a shape of (num_timesteps, num_features) , therefore: If each input sample has 69 timesteps, where each timestep consists of 1 feature value, then the input shape would be (69, 1) .
It's just python notation for creating a tuple that contains only one element. input_shape(728,) is the same as batch_input=(batch_size,728) . This means that each sample has 728 values.
For example, if the input shape is (8,) and number of unit is 16, then the output shape is (16,). All layer will have batch size as the first dimension and so, input shape will be represented by (None, 8) and the output shape as (None, 16).
Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?
Yes, they are practically equivalent, your experiments confirm it, see also this discussion.
However I can see no point in restricting the batch size on model level.
Batch size restriction is sometimes necessary, the example that comes to my mind is a stateful LSTM, in which the last cell state in a batch is remembered and used for initialization for subsequent batches. This ensures the client won't feed different batch sizes into the network. Example code:
# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, data_dim)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With