Keras ConvLSTM2D: ValueError on output layer

Question

I am trying to train a 2D convolutional LSTM to make categorical predictions based on video data. However, my output layer seems to be running into a problem:

"ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)"

My current model is based off of the ConvLSTM2D example provided by Keras Team. I believe that the above error is the result of my misunderstanding the example and its basic principles.

Data

I have an arbitrary number of videos, where each video contains an arbitrary number of frames. Each frame is 135x240x1 (color channels last). This results in an input shape of (None, None, 135, 240, 1), where the two "None" values are batch size and timesteps in that order. If I train on a single video with a 1052 frames, then my input shape becomes (1, 1052, 135, 240, 1).

For each frame, the model should predict values between 0 and 1 across 9 classes. This means that my output shape is (None, None, 9). If I train on a single video with 1052 frames, then this shape becomes (1, 1052, 9).

Model

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
dense_1 (Dense)              (None, None, 135, 240, 9) 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Source code

model = Sequential()

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        input_shape=(None, 135, 240, 1),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(Dense(
        units=classes,
        activation='softmax'
))
model.compile(
        loss='categorical_crossentropy',
        optimizer='adadelta'
)
model.fit_generator(generator=training_sequence)

Traceback

Epoch 1/1
Traceback (most recent call last):
  File ".\lstm.py", line 128, in <module>
    main()
  File ".\lstm.py", line 108, in main
    model.fit_generator(generator=training_sequence)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 2244, in fit_generator
    class_weight=class_weight)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 1884, in train_on_batch
    class_weight=class_weight)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 1487, in _standardize_user_data
    exception_prefix='target')
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 113, in _standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 5 dimensions, but got array with shape (1, 1939, 9)

A sample input shape printed with batch size set to 1 is (1, 1389, 135, 240, 1). This shape matches the requirements I described above, so I think my Keras Sequence subclass (in the source code as "training_sequence") is correct.

I suspect that the problem is caused by my going directly from BatchNormalization() to Dense(). After all, the traceback indicates that the problem is occurring in dense_1 (the final layer). However, I wouldn't want to lead anyone astray with my beginner-level knowledge, so please take my assessment with a grain of salt.

Edit 3/27/2018

After reading this thread, which involves a similar model, I changed my final ConvLSTM2D layer so that the return_sequences parameter is set to False instead of True. I also added a GlobalAveragePooling2D layer before my Dense layer. The updated model is as follows:

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, 135, 240, 40)      115360
_________________________________________________________________
batch_normalization_3 (Batch (None, 135, 240, 40)      160
_________________________________________________________________
global_average_pooling2d_1 ( (None, 40)                0
_________________________________________________________________
dense_1 (Dense)              (None, 9)                 369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Here is a new copy of the traceback:

Traceback (most recent call last):
  File ".\lstm.py", line 131, in <module>
    main()
  File ".\lstm.py", line 111, in main
    model.fit_generator(generator=training_sequence)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\models.py", line 1253, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 2244, in fit_generator
    class_weight=class_weight)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 1884, in train_on_batch
    class_weight=class_weight)
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 1487, in _standardize_user_data
    exception_prefix='target')
  File "C:\Users\matth\Anaconda3\envs\capstone-gpu\lib\site-packages\keras\engine	raining.py", line 113, in _standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1, 1034, 9)

I printed the x and y shapes on this run. x was (1, 1034, 135, 240, 1) and y was (1, 1034, 9). This may narrow the problem down. It looks like the problem is related to the y data rather than the x data. Specifically, the Dense layer doesn't like the temporal dim. However, I am not sure how to rectify this issue.

Edit 3/28/2018

Yu-Yang's solution worked. For anyone with a similar problem who wants to see what the final model looked like, here is the summary:

Layer (type)                 Output Shape              Param #
=================================================================
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 135, 240, 40 59200
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_2 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
conv_lst_m2d_3 (ConvLSTM2D)  (None, None, 135, 240, 40 115360
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 135, 240, 40 160
_________________________________________________________________
average_pooling3d_1 (Average (None, None, 1, 1, 40)    0
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 40)          0
_________________________________________________________________
dense_1 (Dense)              (None, None, 9)           369
=================================================================
Total params: 290,769
Trainable params: 290,529
Non-trainable params: 240

Also, the source code:

model = Sequential()

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        input_shape=(None, 135, 240, 1),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(ConvLSTM2D(
        filters=40,
        kernel_size=(3, 3),
        padding='same',
        return_sequences=True))
model.add(BatchNormalization())

model.add(AveragePooling3D((1, 135, 240)))
model.add(Reshape((-1, 40)))
model.add(Dense(
        units=9,
        activation='sigmoid'))

model.compile(
        loss='categorical_crossentropy',
        optimizer='adadelta'
)

Yu-Yang · Accepted Answer

If you want a prediction per frame, then you should definitely set return_sequences=True in your last ConvLSTM2D layer.

For the ValueError on target shape, replace the GlobalAveragePooling2D() layer with AveragePooling3D((1, 135, 240)) plus Reshape((-1, 40)) to make the output shape compatible with your target array.

Keras ConvLSTM2D: ValueError on output layer

Tags:

python

tensorflow

keras

lstm

conv-neural-network

Matthew Harrison

1 Answers

Yu-Yang

Recent Activity

Donate For Us

Keras ConvLSTM2D: ValueError on output layer

Tags:

python

tensorflow

keras

lstm

conv-neural-network

Matthew Harrison

1 Answers

Yu-Yang

Related questions

Recent Activity

Donate For Us