Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatening an attention layer with decoder input seq2seq model on Keras

I am trying to implement a sequence 2 sequence model with attention using the Keras library. The block diagram of the model is as follows

Enter image description here

The model embeds the input sequence into 3D tensors. Then a bidirectional lstm creates the encoding layer. Next the encoded sequences are sent to a custom attention layer that returns a 2D tensor having attention weights for each hidden node.

The decoder input is injected in the model as one hot vector. Now in the decoder (another bi-lstm) both decoder input and the attention weight are passed as input. The output of the decoder is sent to time distributed dense layer with the softmax activation function to get the output for every time step in the means of probability. The code of the model is as follows:

encoder_input = Input(shape=(MAX_LENGTH_Input, ))

embedded = Embedding(input_dim=vocab_size_input, output_dim= embedding_width, trainable=False)(encoder_input)

encoder = Bidirectional(LSTM(units= hidden_size, input_shape=(MAX_LENGTH_Input,embedding_width), return_sequences=True, dropout=0.25, recurrent_dropout=0.25))(embedded)

attention = Attention(MAX_LENGTH_Input)(encoder)

decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))

merge = concatenate([attention, decoder_input])

decoder = Bidirectional(LSTM(units=hidden_size, input_shape=(MAX_LENGTH_Output,vocab_size_output))(merge))

output = TimeDistributed(Dense(MAX_LENGTH_Output, activation="softmax"))(decoder)

The problem is when I am concatenating the attention layer and decoder input. Since the decoder input is a 3D tensor whereas attention is a 2D tensor, it's showing the following error:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 1024), (None, 10, 8281)]

How can I convert a 2D Attention tensor into a 3D tensor?

like image 480
C M Khaled Saifullah Avatar asked Jul 25 '18 20:07

C M Khaled Saifullah


1 Answers

Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder. In that case you need to RepeatVector to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:

# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...

Take note that this will repeat the same attention vector for every timestep.

like image 62
nuric Avatar answered Sep 20 '22 02:09

nuric