I am trying to create an image captioning model. Could you please help with this error? input1 is the image vector, input2 is the caption sequence. 32 is the caption length. I want to concatenate the image vector with the embedding of the sequence and then feed it to the decoder model.
def define_model(vocab_size, max_length):
input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)
input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 512, mask_zero=True)(input2)
print(e1.shape)
dec1 = tf.concat([input1,e1], axis=2)
print(dec1.shape)
dec2 = LSTM(512)(dec1)
dec3 = LSTM(256)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(256, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())
return model
ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 512]
This error occurs when an LSTM layer gets input in 2D instead of 3D. For instance:
(64, 100)
The correct format is (n_samples, time_steps, features)
:
(64, 5, 100)
In this case, the mistake you did was that the input of dec3
, which is an LSTM layer, was the output of dec2
, which is also an LSTM layer. By default, the argument return_sequences
in an LSTM layer is False
. This means that the first LSTM returned a 2D tensor, which was incompatible with the next LSTM layer. I solved your issue by setting return_sequences=True
in your first LSTM layer.
Also, there was an error in this line:
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
input1
was not an input layer because you reassigned it. See:
input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)
I renamed the second one e0
, consistent with how you're naming your variables.
Now, everything is working:
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras import Input
vocab_size, max_length = 1000, 32
input1 = Input(shape=(128))
e0 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)
input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 128, mask_zero=True)(input2)
print(e1.shape)
dec1 = Concatenate()([e0, e1])
print(dec1.shape)
dec2 = LSTM(16, return_sequences=True)(dec1)
dec3 = LSTM(16)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(32, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())
Model: "model_2"
_________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=================================================================================
input_24 (InputLayer) [(None, 128)] 0
_________________________________________________________________________________
input_25 (InputLayer) [(None, 32)] 0
_________________________________________________________________________________
repeat_vector_12 (RepeatVector) (None, 32, 128) 0 input_24[0][0]
_________________________________________________________________________________
embedding_11 (Embedding) (None, 32, 128) 128000 input_25[0][0]
_________________________________________________________________________________
concatenate_7 (Concatenate) (None, 32, 256) 0 repeat_vector_12[0][0]
embedding_11[0][0]
_________________________________________________________________________________
lstm_12 (LSTM) (None, 32, 16) 17472 concatenate_7[0][0]
_________________________________________________________________________________
lstm_13 (LSTM) (None, 16) 2112 lstm_12[0][0]
_________________________________________________________________________________
dropout_2 (Dropout) (None, 16) 0 lstm_13[0][0]
_________________________________________________________________________________
dense_4 (Dense) (None, 32) 544 dropout_2[0][0]
_________________________________________________________________________________
dense_5 (Dense) (None, 1000) 33000 dense_4[0][0]
=================================================================================
Total params: 181,128
Trainable params: 181,128
Non-trainable params: 0
_________________________________________________________________________________
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With