Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: reshape to connect lstm and conv

This question exists as a github issue , too. I would like to build a neural network in Keras which contains both 2D convolutions and an LSTM layer.

The network should classify MNIST. The training data in MNIST are 60000 grey-scale images of handwritten digits from 0 to 9. Each image is 28x28 pixels.

I've splitted the images into four parts (left/right, up/down) and rearranged them in four orders to get sequences for the LSTM.

|     |      |1 | 2|
|image|  ->  -------   -> 4 sequences: |1|2|3|4|,  |4|3|2|1|, |1|3|2|4|, |4|2|3|1|
|     |      |3 | 4|

One of the small sub-images has the dimension 14 x 14. The four sequences are stacked together along the width (shouldn't matter whether width or height).

This creates a vector with the shape [60000, 4, 1, 56, 14] where:

  • 60000 is the number of samples
  • 4 is the number of elements in a sequence (# of timesteps)
  • 1 is the depth of colors (greyscale)
  • 56 and 14 are width and height

Now this should be given to a Keras model. The problem is to change the input dimensions between the CNN and the LSTM. I searched online and found this question: Python keras how to change the size of input after convolution layer into lstm layer

The solution seems to be a Reshape layer which flattens the image but retains the timesteps (as opposed to a Flatten layer which would collapse everything but the batch_size).

Here's my code so far:

nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64

model=Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
    border_mode="valid", input_shape=[1,56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))


model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

This code creates an error message:

ValueError: total size of new array must be unchanged

Apparently the input to the Reshape layer is incorrect. As an alternative, I tried to pass the timesteps to the Reshape layer, too:

model.add(Reshape((4,56*14)))

This doesn't feel right and in any case, the error stays the same.

Am I doing this the right way ? Is a Reshape layer the proper tool to connect CNN and LSTM ?

There are rather complex approaches to this problem. Such as this: https://github.com/fchollet/keras/pull/1456 A TimeDistributed Layer which seems to hide the timestep dimension from following layers.

Or this: https://github.com/anayebi/keras-extra A set of special layers for combining CNNs and LSTMs.

Why are there so complicated (at least they seem complicated to me) solutions, if a simple Reshape does the trick ?

UPDATE:

Embarrassingly, I forgot that the dimensions will be changed by the pooling and (for lack of padding) the convolutions, too. kgrm advised me to use model.summary() to check the dimensions.

The output of the layer before the Reshape layer is (None, 32, 26, 5), I changed the reshape to: model.add(Reshape((32*26*5,))).

Now the ValueError is gone, instead the LSTM complains:

Exception: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2

It seems like I need to pass the timestep dimension through the entire network. How can I do that ? If I add it to the input_shape of the Convolution, it complains, too: Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[4, 1, 56,14])

Exception: Input 0 is incompatible with layer convolution2d_44: expected ndim=4, found ndim=5

like image 934
lhk Avatar asked Oct 24 '16 14:10

lhk


People also ask

What is the input shape for LSTM?

The input of the LSTM is always is a 3D array. (batch_size, time_steps, seq_len) . The output of the LSTM could be a 2D array or 3D array depending upon the return_sequences argument.

What is ConvLSTM?

ConvLSTM is a type of recurrent neural network for spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors.

What is the difference between ConvLSTM and CNN LSTM?

The ConvLSTM differs from simple CNN + LSTM in that, for CNN + LSTM, the convolution structure (CNN) is applied as the first layer and sequentially LSTM layer is applied in the second layer.

How many inputs does LSTM have?

There are 3 inputs to the LSTM cell: ht−1 previous timestep (t-1) Hidden State value. ct−1 previous timestep (t-1) Cell State value. xt current timestep (t) Input value.


1 Answers

According to Convolution2D definition your input must be 4-dimensional with dimensions (samples, channels, rows, cols). This is the direct reason why are you getting an error.

To resolve that you must use TimeDistributed wrapper. This allows you to use static (not recurrent) layers across the time.

like image 125
Marcin Możejko Avatar answered Oct 21 '22 18:10

Marcin Możejko