There have been a number of papers (particularly for image captioning) that use CNN and LSTM architectures jointly for prediction and generation tasks. However, they all seem to train the CNN independently from the LSTM. I was looking through Torch and TensorFlow (with Keras), and couldn't find a reason why it shouldn't be possible to do end-to-end training (at least from an architecture design point-of-view), but there doesn't seem to be any documentation for such a model.
So, can it be done? Does Torch or TensorFlow (or even Theanos or Caffe) support jointly training an end-to-end CNN-LSTM neural network? If so, is it as simple as just linking the output from the CNN to the input to the LSTM and running SGD? Or is there more complexity to it?
CNN-LSTM model can be trained end-end using tensorflow
Suppose you have a CNN model M
with input X
and a LSTM model LSTM
. This can be trained end-end
# here CNN is used to extract meaning features from the input data
features = M(X)
# CNN features are used as input to LSTM
y = LSTM(features)
cost = cost_function(ground_truths, y)
A comprehensive example showing end-end training of CNN-LSTM model for sentence classification on imdb
dataset is available at CNN_LSTM-end-end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With