I have read a sequence of images into a numpy array with shape (7338, 225, 1024, 3)
where 7338
is the sample size, 225
are the time steps and 1024 (32x32)
are flattened image pixels, in 3
channels (RGB).
I have a sequential model with an LSTM layer:
model = Sequential()
model.add(LSTM(128, input_shape=(225, 1024, 3))
But this results in the error:
Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
The documentation mentions that the input tensor for LSTM layer should be a 3D tensor with shape (batch_size, timesteps, input_dim)
, but in my case my input_dim
is 2D.
What is the suggested way to input a 3 channel image into an LSTM layer in Keras?
If you want the number of images to be a sequence (like a movie with frames), you need to put pixels AND channels as features:
input_shape = (225,3072) #a 3D input where the batch size 7338 wasn't informed
If you want more processing before throwing 3072 features into an LSTM, you can combine or interleave 2D convolutions and LSTMs for a more refined model (not necessarily better, though, each application has its particular behavior).
You can also try to use the new ConvLSTM2D, which will take the five dimensional input:
input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed
I'd probably create a convolutional net with several TimeDistributed(Conv2D(...))
and TimeDistributed(MaxPooling2D(...))
before adding a TimeDistributed(Flatten())
and finally the LSTM()
. This will very probably improve both your image understanding and the performance of the LSTM.
There is now a guide how to create RNNs with nested structures in the keras guide which enable arbitrary input types for each timestep: https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With