Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow 2.0 Combine CNN + LSTM

How can you add an LSTM Layer after (flattened) conv2d Layer in Tensorflow 2.0 / Keras? My Training input data has the following shape (size, sequence_length, height, width, channels). For a convolutional layer, I can only process one image a a time, for the LSTM Layer I need a sequence of features. Is there a way to reshape your data before the LSTM Layer, so you can combine both?

like image 319
michael Avatar asked Oct 16 '22 11:10

michael


1 Answers

From an overview of shape you have provided which is (size, sequence_length, height, width, channels), it appears that you have sequences of images for each label. For this purpose, we usually make use of Conv3D. I am enclosing a sample code below:

import tensorflow as tf

SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3

data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))

input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Conv3D(32, (3, 3, 3))(input)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)

model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()

Output:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 50, 128, 128, 3)] 0         
_________________________________________________________________
conv3d (Conv3D)              (None, 48, 126, 126, 32)  2624      
_________________________________________________________________
reshape (Reshape)            (None, None, 32)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 200)               186400    
=================================================================
Total params: 189,024
Trainable params: 189,024
Non-trainable params: 0

If you still wanted to make use of Conv2D which is not recommended in your case, you will have to do something like shown below. Basically, you are appending the sequence of images across the height dimension, which will make you to loose temporal dimensions.

import tensorflow as tf

SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3

data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))

input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Reshape((SEQUENCE_LENGTH * HEIGHT, WIDTH, CHANNELS))(input)
hidden = tf.keras.layers.Conv2D(32, (3, 3))(hidden)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)

model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()

Output:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 50, 128, 128, 3)] 0         
_________________________________________________________________
reshape (Reshape)            (None, 6400, 128, 3)      0         
_________________________________________________________________
conv2d (Conv2D)              (None, 6398, 126, 32)     896       
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 32)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 200)               186400    
=================================================================
Total params: 187,296
Trainable params: 187,296
Non-trainable params: 0
_________________________________________________________________
like image 98
Prasad Avatar answered Oct 20 '22 16:10

Prasad