Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras CNN: Add text as additional input besides image to CNN

I am trying to train a CNN for object classification. As such, I would like to input some text features in addition to the image.

I found an example of this being done here http://cbonnett.github.io/Insight.html

The author constructs two models, a CNN for the image recognition and a normal ANN for the text. Finally he merges them together and applies a softmax activation. As such, his pipeline looks as follows:

merged = Merge([cnn_model, text_model], mode='concat')

### final_model takes the combined models and adds a sofmax classifier to it
final_model = Sequential()
final_model.add(merged)
final_model.add(Dropout(do))
final_model.add(Dense(n_classes, activation='softmax'))

I wonder if this is the preferred method of combining image + text or if there are alternative ways of solving such a task using Keras? Stated differently, would it be possible (or even make sense) to include the text as an input directly to the CNN, such that the CNN takes care of both images and text?

like image 503
AaronDT Avatar asked Sep 18 '25 16:09

AaronDT


1 Answers

You are on the right track but yes you can also use a CNN to process text and it is often a faster alternative to using RNNs etc. But you can't use the same CNN to process both text and images, they must be different because text is 1D and image is 2D input not to mention they originate from separate source distributions. So, you'll still end up with 2 sub models if you will:

  1. Process the image using a CNN model.
  2. Process the text using another model (RNNs, ANNs, CNNs or just one-hot encode words etc). By CNN I mean usually a 1D CNN that runs over the words in a sentence.
  3. Merge the 2 latent spaces which tells information about the image and the text.
  4. Run last few Dense layers for classification.
like image 99
nuric Avatar answered Sep 20 '25 06:09

nuric