Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras/TF: Time Distributed CNN+LSTM for visual recognition

enter image description here

I am trying to implement the Model from the article (https://arxiv.org/abs/1411.4389) that basically consists of time-distributed CNNs followed by a sequence of LSTMs using Keras with TF.

However, I am having a problem trying to figure out if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?

Is there a way to run the CNN Layers in parallel (Based on the number of frames in the sequence that I want to process and based on the number of cores that I have)?

And Last, suppose that each entry is composed of "n" frames (in sequence) where n varies based on the current data entry, what is the best suitable input dimension? and would "n" be the batch size? Is there a way to limit the number of CNNs in // to for example 4 (so that you get an output Y after 4 frames are processed)?

P.S.: The inputs are small videos (i.e. a sequence of frames)

P.S.: The output dimension is irrelevant to my question, so it is not discussed here

Thank you

like image 477
charbelfa Avatar asked Jun 27 '17 10:06

charbelfa


People also ask

Is CNN LSTM better than LSTM?

Experiments show that the CNN-LSTM-ML model outperforms other models in terms of prediction accuracy in both the short term (1 month) and long term (12 months). Under the condition that the training data are reduced by 50%, the MAE of the proposed model is 33.6% lower than that of LSTM.

Is LSTM better or CNN?

LSTM required more parameters than CNN, but only about half of DNN. While being the slowest to train, their advantage comes from being able to look at long sequences of inputs without increasing the network size.

Is LSTM and CNN same?

The CNN Long Short-Term Memory Network or CNN LSTM for short is an LSTM architecture specifically designed for sequence prediction problems with spatial inputs, like images or videos.

What is time distributed LSTM?

TimeDistributed(layer, **kwargs) This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension.


1 Answers

[Edited]
Sorry, only-a-link-answer was bad. So I try to answer question one by one.

if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?

Use TimeDistributed function only for Conv and Pooling layers, no need for LSTMs.

Is there a way to run the CNN Layers in parallel?

No, if you use CPU. It's possible if you utilize GPU.
Transparent Multi-GPU Training on TensorFlow with Keras

what is the best suitable input dimension?

Five. (batch, time, width, height, channel).

Is there a way to limit the number of CNNs in // to for example 4

You can do this in the preprocess by manually aligning frames into a specific number, not in the network. In other words, "time" dimension should be 4 if you want to have output after 4 frames are processed.

model = Sequential()

model.add(
    TimeDistributed(
        Conv2D(64, (3, 3), activation='relu'), 
        input_shape=(data.num_frames, data.width, data.height, 1)
    )
)
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))

model.add(TimeDistributed(Conv2D(128, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

model.add(TimeDistributed(Conv2D(256, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

# extract features and dropout 
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.5))

# input to LSTM
model.add(LSTM(256, return_sequences=False, dropout=0.5))

# classifier with sigmoid activation for multilabel
model.add(Dense(data.num_classes, activation='sigmoid'))

Reference:
PRI-MATRIX FACTORIZATION - BENCHMARK

like image 151
teru Avatar answered Oct 24 '22 19:10

teru