Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras bidirectional LSTM: initial_state` was passed that is not compatible with `cell.state_size

I'm attempting to build a stacked bidirectional LSTM seq2seq model in Keras, however I am running into an issue when passing the output states of the encoder to the input states of the decoder. It looks like that should be possible based on this pull request. Ultimately I want to keep the encoder_output vector for additional downstream tasks.

The error message:

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 100), ndim=2)]; however `cell.state_size` is (100, 100)

My model:

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat"
                               name="encoder_lstm_2")(encoder_lstm_1)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(LSTM(latent_size_1, return_sequences=True), 
                                merge_mode="concat", 
                                name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Any help/advice/direction is greatly appreciated!

like image 609
scribbles Avatar asked Mar 04 '23 16:03

scribbles


1 Answers

There are two issues with your code,

  1. as @Daniel pointed out, you should not concatenate the states in the encoder_states (rather have encoder_states = [forward_h, forward_c, backward_h, backward_c])

  2. The state returned by your encoder is of size latent_size_2 (not latent_size_1). So if you want that as your decoder initial state, your decoder should be latent_size_2.

You can find the code with these corrections below.

from tensorflow.keras.layers import Embedding, Input, Bidirectional, LSTM, Dense, Concatenate
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import Model

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250
num_words = 5000
embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(1.0),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat", name="encoder_lstm_2")(encoder_lstm_1)
encoder_states = [forward_h, forward_c, backward_h, backward_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(
    LSTM(latent_size_2, return_sequences=True), 
    merge_mode="concat", name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
like image 200
thushv89 Avatar answered Apr 28 '23 05:04

thushv89