I am trying to train a CNN for object classification. As such, I would like to input some text features in addition to the image.
I found an example of this being done here http://cbonnett.github.io/Insight.html
The author constructs two models, a CNN for the image recognition and a normal ANN for the text. Finally he merges them together and applies a softmax activation. As such, his pipeline looks as follows:
merged = Merge([cnn_model, text_model], mode='concat')
### final_model takes the combined models and adds a sofmax classifier to it
final_model = Sequential()
final_model.add(merged)
final_model.add(Dropout(do))
final_model.add(Dense(n_classes, activation='softmax'))
I wonder if this is the preferred method of combining image + text or if there are alternative ways of solving such a task using Keras? Stated differently, would it be possible (or even make sense) to include the text as an input directly to the CNN, such that the CNN takes care of both images and text?
You are on the right track but yes you can also use a CNN to process text and it is often a faster alternative to using RNNs etc. But you can't use the same CNN to process both text and images, they must be different because text is 1D and image is 2D input not to mention they originate from separate source distributions. So, you'll still end up with 2 sub models if you will:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With