versions: Python 3.6.9, Tensorflow 2.0.0, CUDA 10.0, CUDNN 7.6.1, Nvidia driver version 410.78.
I'm trying to port a LSTM-based Seq2Seq tf.keras model to tensorflow 2.0
Right now I'm facing the following error when I try to call predict on the decoder model (see below for the actual inference setup code)
It is as if it were expecting a single word as argument, but I need it to decode a full sentence (my sentences are right-padded sequences of word indices, of length 24)
P.S.: This code used to work exactly as it is on TF 1.15
InvalidArgumentError: [_Derived_] Inputs to operation while/body/_1/Select_2 of type Select must have the same size and shape.
Input 0: [1,100] != input 1: [24,100]
[[{{node while/body/_1/Select_2}}]]
[[lstm_1_3/StatefulPartitionedCall]] [Op:__inference_keras_scratch_graph_45160]
Function call stack:
keras_scratch_graph -> keras_scratch_graph -> keras_scratch_graph
FULL MODEL

ENCODER inference model

DECODER inference model

Important information: sequences are right-padded to 24 elements and 100 is the number of dimensions for each word embedding. This is why the error message (and the prints) show that the input shapes are (24,100).
note that this code runs on a CPU. running it on a GPU leads to another error detailed here
# original_keyword is a sample text string
with tf.device("/device:CPU:0"):
# this method turns the raw string into a right-padded sequence
query_sequence = keyword_to_padded_sequence_single(original_keyword)
# no problems here
initial_state = encoder_model.predict(query_sequence)
print(initial_state[0].shape) # prints (24, 100)
print(initial_state[1].shape) # (24, 100)
empty_target_sequence = np.zeros((1,1))
empty_target_sequence[0,0] = word_dict_titles["sos"]
# ERROR HAPPENS HERE:
# InvalidArgumentError: [_Derived_] Inputs to operation while/body/_1/Select_2 of type Select
# must have the same size and shape. Input 0: [1,100] != input 1: [24,100]
decoder_outputs, h, c = decoder_model.predict([empty_target_sequence] + initial_state)
Things I have tried
disabling eager mode (this just made training much slower and the error during inference stayed the same)
reshaping the input prior to feeding it to the predict function
manually computing (embedding_layer.compute_mask(inputs)) and setting masks when calling the LSTM layers
From what I can see from your model architecture, the initial_state is an array of tensors with shapes: [(?, 100), (?, 100), (?, 100)]. In your case the unknown dimension is fixed to 24.
Then, you build a Numpy array/TF tensor of shape (1, 1). You wrap it inside a list and append your initial_state. Hence you get a list of tensors with shapes: [(1, 1), (?, 100), (?, 100), (?, 100)].
You try to pass it as an input to your decoder model which expect 3 inputs (a list of inputs) with shapes [(?, 24), (?, 100), (?, 100)].
Starting from that it seems there is something wrong...
However, TF complains about the inputs of the operation while/body/_1/Select_2. The input 1 should come from any of your initial_state tensor (which we know has a shape (24, 100)). The input 2 seems to come from your empty_target_sequence that has a shape (1, 1) which can be broadcasted to (1, 100). By the way, it is strange that it is not broadcasted to (24, 100) as both dimensions are of size 1...
I would recommend to check your graph in TensorBoard. You should be able to find the messy operation and track its input tensors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With