Concatening an attention layer with decoder input seq2seq model on Keras

Question

I am trying to implement a sequence 2 sequence model with attention using the Keras library. The block diagram of the model is as follows

Enter image description here

The model embeds the input sequence into 3D tensors. Then a bidirectional lstm creates the encoding layer. Next the encoded sequences are sent to a custom attention layer that returns a 2D tensor having attention weights for each hidden node.

The decoder input is injected in the model as one hot vector. Now in the decoder (another bi-lstm) both decoder input and the attention weight are passed as input. The output of the decoder is sent to time distributed dense layer with the softmax activation function to get the output for every time step in the means of probability. The code of the model is as follows:

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim=vocab_size_input, output_dim= embedding_width, trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25, recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))

merge = concatenate([attention, decoder_input])

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

The problem is when I am concatenating the attention layer and decoder input. Since the decoder input is a 3D tensor whereas attention is a 2D tensor, it's showing the following error:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1024), (None, 10, 8281)]

How can I convert a 2D Attention tensor into a 3D tensor?

nuric · Accepted Answer

Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder. In that case you need to RepeatVector to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:

# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...

Take note that this will repeat the same attention vector for every timestep.

Concatening an attention layer with decoder input seq2seq model on Keras

Tags:

python

tensorflow

deep-learning

keras

C M Khaled Saifullah

1 Answers

nuric

Recent Activity

Donate For Us

Concatening an attention layer with decoder input seq2seq model on Keras

Tags:

python

tensorflow

deep-learning

keras

C M Khaled Saifullah

1 Answers

nuric

Related questions

Recent Activity

Donate For Us