I am trying to implement the Model from the article (https://arxiv.org/abs/1411.4389) that basically consists of time-distributed CNNs followed by a sequence of LSTMs using Keras with TF.
However, I am having a problem trying to figure out if I should include the TimeDirstibuted
function just for my Convolutional & Pooling Layers or also for the LSTMs?
Is there a way to run the CNN Layers in parallel (Based on the number of frames in the sequence that I want to process and based on the number of cores that I have)?
And Last, suppose that each entry is composed of "n" frames (in sequence) where n varies based on the current data entry, what is the best suitable input dimension? and would "n" be the batch size? Is there a way to limit the number of CNNs in // to for example 4 (so that you get an output Y after 4 frames are processed)?
P.S.: The inputs are small videos (i.e. a sequence of frames)
P.S.: The output dimension is irrelevant to my question, so it is not discussed here
Thank you
Experiments show that the CNN-LSTM-ML model outperforms other models in terms of prediction accuracy in both the short term (1 month) and long term (12 months). Under the condition that the training data are reduced by 50%, the MAE of the proposed model is 33.6% lower than that of LSTM.
LSTM required more parameters than CNN, but only about half of DNN. While being the slowest to train, their advantage comes from being able to look at long sequences of inputs without increasing the network size.
The CNN Long Short-Term Memory Network or CNN LSTM for short is an LSTM architecture specifically designed for sequence prediction problems with spatial inputs, like images or videos.
TimeDistributed(layer, **kwargs) This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension.
[Edited]
Sorry, only-a-link-answer was bad. So I try to answer question one by one.
if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?
Use TimeDistributed function only for Conv and Pooling layers, no need for LSTMs.
Is there a way to run the CNN Layers in parallel?
No, if you use CPU. It's possible if you utilize GPU.
Transparent Multi-GPU Training on TensorFlow with Keras
what is the best suitable input dimension?
Five. (batch, time, width, height, channel).
Is there a way to limit the number of CNNs in // to for example 4
You can do this in the preprocess by manually aligning frames into a specific number, not in the network. In other words, "time" dimension should be 4 if you want to have output after 4 frames are processed.
model = Sequential()
model.add(
TimeDistributed(
Conv2D(64, (3, 3), activation='relu'),
input_shape=(data.num_frames, data.width, data.height, 1)
)
)
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(TimeDistributed(Conv2D(128, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(256, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
# extract features and dropout
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.5))
# input to LSTM
model.add(LSTM(256, return_sequences=False, dropout=0.5))
# classifier with sigmoid activation for multilabel
model.add(Dense(data.num_classes, activation='sigmoid'))
Reference:
PRI-MATRIX FACTORIZATION - BENCHMARK
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With