I am trying to train a 2D convolutional LSTM to make categorical predictions based on video data. However, my output layer seems to be running into a problem:
"ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)"
My current model is based off of the ConvLSTM2D example provided by Keras Team. I believe that the above error is the result of my misunderstanding the example and its basic principles.
Data
I have an arbitrary number of videos, where each video contains an arbitrary number of frames. Each frame is 135x240x1 (color channels last). This results in an input shape of (None, None, 135, 240, 1), where the two "None" values are batch size and timesteps in that order. If I train on a single video with a 1052 frames, then my input shape becomes (1, 1052, 135, 240, 1).
For each frame, the model should predict values between 0 and 1 across 9 classes. This means that my output shape is (None, None, 9). If I train on a single video with 1052 frames, then this shape becomes (1, 1052, 9).
Model
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
dense_1 (Dense) (None, None, 135, 240, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
Source code
model = Sequential()
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
input_shape=(None, 135, 240, 1),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(Dense(
units=classes,
activation='softmax'
))
model.compile(
loss='categorical_crossentropy',
optimizer='adadelta'
)
model.fit_generator(generator=training_sequence)
Traceback
Epoch 1/1
Traceback (most recent call last):
File ".\lstm.py", line 128, in <module>
main()
File ".\lstm.py", line 108, in main
model.fit_generator(generator=training_sequence)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generator
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batch
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_data
exception_prefix='target')
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)
A sample input shape printed with batch size set to 1 is (1, 1389, 135, 240, 1). This shape matches the requirements I described above, so I think my Keras Sequence subclass (in the source code as "training_sequence") is correct.
I suspect that the problem is caused by my going directly from BatchNormalization() to Dense(). After all, the traceback indicates that the problem is occurring in dense_1 (the final layer). However, I wouldn't want to lead anyone astray with my beginner-level knowledge, so please take my assessment with a grain of salt.
Edit 3/27/2018
After reading this thread, which involves a similar model, I changed my final ConvLSTM2D layer so that the return_sequences parameter is set to False instead of True. I also added a GlobalAveragePooling2D layer before my Dense layer. The updated model is as follows:
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, 135, 240, 40) 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, 135, 240, 40) 160
_________________________________________________________________
global_average_pooling2d_1 ( (None, 40) 0
_________________________________________________________________
dense_1 (Dense) (None, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
Here is a new copy of the traceback:
Traceback (most recent call last):
File ".\lstm.py", line 131, in <module>
main()
File ".\lstm.py", line 111, in main
model.fit_generator(generator=training_sequence)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 2244, in fit_generator
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1884, in train_on_batch
class_weight=class_weight)
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_data
exception_prefix='target')
File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine\training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1, 1034, 9)
I printed the x and y shapes on this run. x was (1, 1034, 135, 240, 1) and y was (1, 1034, 9). This may narrow the problem down. It looks like the problem is related to the y data rather than the x data. Specifically, the Dense layer doesn't like the temporal dim. However, I am not sure how to rectify this issue.
Edit 3/28/2018
Yu-Yang's solution worked. For anyone with a similar problem who wants to see what the final model looked like, here is the summary:
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D) (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D) (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
average_pooling3d_1 (Average (None, None, 1, 1, 40) 0
_________________________________________________________________
reshape_1 (Reshape) (None, None, 40) 0
_________________________________________________________________
dense_1 (Dense) (None, None, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240
Also, the source code:
model = Sequential()
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
input_shape=(None, 135, 240, 1),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(AveragePooling3D((1, 135, 240)))
model.add(Reshape((-1, 40)))
model.add(Dense(
units=9,
activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
optimizer='adadelta'
)
If you want a prediction per frame, then you should definitely set return_sequences=True
in your last ConvLSTM2D
layer.
For the ValueError
on target shape, replace the GlobalAveragePooling2D()
layer with AveragePooling3D((1, 135, 240))
plus Reshape((-1, 40))
to make the output shape compatible with your target array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With