Keras bidirectional LSTM: initial_state` was passed that is not compatible with `cell.state_size

Question

I'm attempting to build a stacked bidirectional LSTM seq2seq model in Keras, however I am running into an issue when passing the output states of the encoder to the input states of the decoder. It looks like that should be possible based on this pull request. Ultimately I want to keep the encoder_output vector for additional downstream tasks.

The error message:

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 100), ndim=2)]; however `cell.state_size` is (100, 100)

My model:

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat"
                               name="encoder_lstm_2")(encoder_lstm_1)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(LSTM(latent_size_1, return_sequences=True), 
                                merge_mode="concat", 
                                name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Any help/advice/direction is greatly appreciated!

thushv89 · Accepted Answer

There are two issues with your code,

as @Daniel pointed out, you should not concatenate the states in the encoder_states (rather have encoder_states = [forward_h, forward_c, backward_h, backward_c])
The state returned by your encoder is of size latent_size_2 (not latent_size_1). So if you want that as your decoder initial state, your decoder should be latent_size_2.

You can find the code with these corrections below.

from tensorflow.keras.layers import Embedding, Input, Bidirectional, LSTM, Dense, Concatenate
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import Model

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250
num_words = 5000
embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(1.0),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat", name="encoder_lstm_2")(encoder_lstm_1)
encoder_states = [forward_h, forward_c, backward_h, backward_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(
    LSTM(latent_size_2, return_sequences=True), 
    merge_mode="concat", name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Keras bidirectional LSTM: initial_state` was passed that is not compatible with `cell.state_size

Tags:

python

tensorflow

keras

scribbles

1 Answers

thushv89

Recent Activity

Donate For Us

Keras bidirectional LSTM: initial_state` was passed that is not compatible with `cell.state_size

Tags:

python

tensorflow

keras

scribbles

1 Answers

thushv89

Related questions

Recent Activity

Donate For Us