Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Application of ConvLSTM2D layers

I was going through some codes over github and noticed a layer called ConvLSTM2D in Keras. The Keras documentations states that It is similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional..

I am wondering what will be the practical application of this layer. I am familiar with NLP and I haven't seen this layer being used.

Which area of Machine Learning / Deep Learning make use of this layer.?

like image 790
Sreeram TP Avatar asked Jan 03 '23 15:01

Sreeram TP


2 Answers

ConvLSTM2D Layer is used in computer vision problems for spatiotemporal problems i.e where you want to extract the spatial features as well as the correlation in time. Refer to the ConvLSTM paper

"Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting"

It explains that the fully-connected LSTM can capture the temporal correlation but do not encode the spatial data. Thats why they propose a model where the input to state and state to state transitions are convolutional

I could find papers where ConvLSTM was a part of the model for natural video sequence prediction, gesture recognition and video classification i.e basically where we want to learn spatial and temporal data

like image 121
user239457 Avatar answered Jan 13 '23 10:01

user239457


As user239457 said is for extracting features that are time and spatial dependent (actually he cited the article in which the ConvLSTM layer was proposed for the first time).

So, what kind of information we consume everyday that is time and spatial correlated? Yep, you guessed right: videos.

In this sense, there are other areas of Deep learning that uses this layer:

  • Object Tracking: sequences of images in which you want to predict the trajectory of a moving target, i.e. from images to bounding boxes
  • Object Segmentation in video: same as before but with segmentation (harder)
  • Activity Recognition: sequences of images in which you want to generate a video caption (a sentence describing what is happening)

If you are curious about this there is a lightweight dataset which you can use to learn the basics of Object Tracking: Moving MNIST

like image 39
JVGD Avatar answered Jan 13 '23 09:01

JVGD