Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between input_shape and batch_input_shape in LSTM

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

On a simple example, I couldn't observe any difference between:

model = Sequential()
model.add(LSTM(1, batch_input_shape=(None,5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

and

model = Sequential()
model.add(LSTM(1, input_shape=(5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

However when I set the batch size to 12 batch_input_shape=(12,5,1) and used batch_size=10 when fitting the model, I got an error.

ValueError: Cannot feed value of shape (10, 5, 1) for Tensor 'lstm_96_input:0', which has shape '(12, 5, 1)'

Which obviously makes sense. However I can see no point in restricting the batch size on model level.

Am I missing something?

like image 490
Andrzej Gis Avatar asked Mar 20 '18 00:03

Andrzej Gis


People also ask

What is Input_shape in LSTM?

The input of LSTM layer has a shape of (num_timesteps, num_features) , therefore: If each input sample has 69 timesteps, where each timestep consists of 1 feature value, then the input shape would be (69, 1) .

What is Input_shape?

It's just python notation for creating a tuple that contains only one element. input_shape(728,) is the same as batch_input=(batch_size,728) . This means that each sample has 728 values.

What is difference between units input shape and output shape in keras layer class?

For example, if the input shape is (8,) and number of unit is 16, then the output shape is (16,). All layer will have batch size as the first dimension and so, input shape will be represented by (None, 8) and the output shape as (None, 16).


1 Answers

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

Yes, they are practically equivalent, your experiments confirm it, see also this discussion.

However I can see no point in restricting the batch size on model level.

Batch size restriction is sometimes necessary, the example that comes to my mind is a stateful LSTM, in which the last cell state in a batch is remembered and used for initialization for subsequent batches. This ensures the client won't feed different batch sizes into the network. Example code:

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
like image 158
Maxim Avatar answered Sep 22 '22 08:09

Maxim