I used VGG 16-Layer Caffe model for image captions and I have several captions per image. Now, I want to generate a sentence from those captions (words).
I read in a paper on LSTM that I should remove the SoftMax layer from the training network and provide the 4096 feature vector from fc7
layer directly to LSTM.
I am new to LSTM and RNN stuff.
Where should I begin? Is there any tutorial showing how to generate sentence by sequence labeling?
AFAIK the master branch of BVLC/caffe does not yet support a recurrent layer architecture.
You should pull branch recurrent
from jeffdonahue/caffe. This branch supports RNN and LSTM.
It also contains a detailed example on how to generate image captions trained using MS COCO data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With