Are 1
and 2
the same?
Convolution2D
layers and LSTM
layers ConvLSTM2D
If there is any difference, could you explain it for me?
ConvLSTM2D is an implementation of paper Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting that introduces a special architecture that combines gating of LSTM with 2D convolutions. The architecture is recurrent: it keeps is a hidden state between steps.
ConvLSTM is a kind of LSTM which contains convolution operation inside the LSTM cell. CNN LSTM is rather CNN+LSTM model where instead of feeding the input to LSTM, it is first processed through a CNN and then output of the CNN is fed to LSTM.
The CNN LSTM architecture involves using Convolutional Neural Network (CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction.
They are not exactly the same, here is why:
Convolution2D
layers and LSTM
layersAs it is known, Convolution2D
serves well for capturing image or spatial features, whilst LSTM
are used to detect correlations over time. However, by stacking these kind of layers, the correlation between space and time features may not be captured properly.
ConvLSTM2D
To solve this, Xingjian Shi et al. proposed a network structure able to capture spatiotemporal correlations, namely ConvLSTM
. In Keras, this is reflected in the ConvLSTM2D
class, which computes convolutional operations in both the input and the recurrent transformations.
Too illustrate this, you can see here the LSTM
code, if you go to the call
method from LSTMCell
, you'd only see:
x_i = K.dot(inputs_i, self.kernel_i) x_f = K.dot(inputs_f, self.kernel_f) x_c = K.dot(inputs_c, self.kernel_c) x_o = K.dot(inputs_o, self.kernel_o)
Instead, the ConvLSTM2DCell
class calls:
x_i = self.input_conv(inputs_i, self.kernel_i, self.bias_i, padding=self.padding) x_f = self.input_conv(inputs_f, self.kernel_f, self.bias_f, padding=self.padding) x_c = self.input_conv(inputs_c, self.kernel_c, self.bias_c, padding=self.padding) x_o = self.input_conv(inputs_o, self.kernel_o, self.bias_o, padding=self.padding) h_i = self.recurrent_conv(h_tm1_i, self.recurrent_kernel_i) h_f = self.recurrent_conv(h_tm1_f, self.recurrent_kernel_f) h_c = self.recurrent_conv(h_tm1_c, self.recurrent_kernel_c) h_o = self.recurrent_conv(h_tm1_o, self.recurrent_kernel_o)
Where:
def input_conv(self, x, w, b=None, padding='valid'): conv_out = K.conv2d(x, w, strides=self.strides, padding=padding, data_format=self.data_format, dilation_rate=self.dilation_rate) if b is not None: conv_out = K.bias_add(conv_out, b, data_format=self.data_format) return conv_out def recurrent_conv(self, x, w): conv_out = K.conv2d(x, w, strides=(1, 1), padding='same', data_format=self.data_format) return conv_out
In LSTM
, the equivalent for h_x
(recurrent transformations) would be:
K.dot(h_tm1_x, self.recurrent_kernel_x)
Instead of ConvLSTM2D
's:
self.recurrent_conv(h_tm1_x, self.recurrent_kernel_x)
These kind of transformations could not be computed with stacked Conv2D
and LSTM
layers.
- Use Convolution2D layers and LSTM layer
In this technique, you stack convolution and LSTM layers. The convolutional layers help you to learn the spatial features and the LSTM helps you learn the correlation in time.
2.Use ConvLSTM2D
ConvLSTM is a LSTM in which the gates (input to state and state to state transitions) are convolution operations.
Research paper- Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
More about ConvLSTM in this SO answer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With