Multilayer Seq2Seq model with LSTM in Keras

Tags:

I was making a seq2seq model in keras. I had built single layer encoder and decoder and they were working fine. But now I want to extend it to multi layer encoder and decoder. I am building it using Keras Functional API.

Training:-

Code for encoder:-

encoder_input=Input(shape=(None,vec_dimension))
encoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(encoder_input)
encoder_lstm=LSTM(vec_dimension,return_state=True)(encoder_lstm)
encoder_output,encoder_h,encoder_c=encoder_lstm

Code for decoder:-

encoder_state=[encoder_h,encoder_c]
decoder_input=Input(shape=(None,vec_dimension))
decoder_lstm= LSTM(vec_dimension,return_state=True,return_sequences=True (decoder_input,initial_state=encoder_state)
decoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(decoder_lstm)
decoder_output,_,_=decoder_lstm

For testing :-

encoder_model=Model(inputs=encoder_input,outputs=encoder_state)
decoder_state_input_h=Input(shape=(None,vec_dimension))
decoder_state_input_c=Input(shape=(None,vec_dimension))
decoder_states_input=[decoder_state_input_h,decoder_state_input_c]
decoder_output,decoder_state_h,decoder_state_c =decoder_lstm #(decoder_input,initial_state=decoder_states_input)
decoder_states=[decoder_state_h,decoder_state_c]
decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)

Now when I try to increase the no. of layers in the decoder for training then training works fine but for testing it dosen't works and throws error.

Actually the problem is when making it multi layer i had shifted the initial_state to a middle layer which used to be specified at the end.So when I am calling it during testing, it is throwing errors.

RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_64:0", shape=(?, ?, 150), dtype=float32) at layer "input_64".The following previous layers were accessed without issue: []

How should I pass the initial_state=decoder_states_input which is for the input layer so that it doesn't throws error. How should I pass the initial_state=decoder_states_input in the end layer for for the first Input layer??

EDIT:-

In that code I have tried to make multiple layers of decoder LSTM. But that's giving error. When working with single layer.The correct codes are:-

Encoder(Training):-

encoder_input=Input(shape=(None,vec_dimension))
encoder_lstm =LSTM(vec_dimension,return_state=True)(encoder_input)
encoder_output,encoder_h,encoder_c=encoder_lstm

Decoder(Training):-

encoder_state=[encoder_h,encoder_c]
decoder_input=Input(shape=(None,vec_dimension))
decoder_lstm= LSTM(vec_dimension, return_state=True, return_sequences=True)
decoder_output,_,_=decoder_lstm(decoder_input,initial_state=encoder_state)

Decoder(Testing)

decoder_output,decoder_state_h,decoder_state_c=decoder_lstm( decoder_input, initial_state=decoder_states_input)
decoder_states=[decoder_state_h,decoder_state_c]
decoder_output,decoder_state_h,decoder_state_c=decoder_lstm (decoder_input,initial_state=decoder_states_input)
decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)

306

asked Jun 18 '18 18:06

SAGAR

1 Answers

EDIT - Updated to use the functional API model in Keras vs. the RNN

from keras.models import Model
from keras.layers import Input, LSTM, Dense, RNN
layers = [256,128] # we loop LSTMCells then wrap them in an RNN layer

encoder_inputs = Input(shape=(None, num_encoder_tokens))

e_outputs, h1, c1 = LSTM(latent_dim, return_state=True, return_sequences=True)(encoder_inputs) 
_, h2, c2 = LSTM(latent_dim, return_state=True)(e_outputs) 
encoder_states = [h1, c1, h2, c2]

decoder_inputs = Input(shape=(None, num_decoder_tokens))

out_layer1 = LSTM(latent_dim, return_sequences=True, return_state=True)
d_outputs, dh1, dc1 = out_layer1(decoder_inputs,initial_state= [h1, c1])
out_layer2 = LSTM(latent_dim, return_sequences=True, return_state=True)
final, dh2, dc2 = out_layer2(d_outputs, initial_state= [h2, c2])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(final)


model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.summary()

And here is the inference setup:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_state_input_h1 = Input(shape=(latent_dim,))
decoder_state_input_c1 = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c, 
                         decoder_state_input_h1, decoder_state_input_c1]
d_o, state_h, state_c = out_layer1(
    decoder_inputs, initial_state=decoder_states_inputs[:2])
d_o, state_h1, state_c1 = out_layer2(
    d_o, initial_state=decoder_states_inputs[-2:])
decoder_states = [state_h, state_c, state_h1, state_c1]
decoder_outputs = decoder_dense(d_o)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

decoder_model.summary()

Lastly, if you are following the Keras seq2seq example, you will have to change the prediction script as there are multiple hidden states that need to be managed vs. just two of them in the single-layer example. There will be 2x the number of layer hidden states

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c, h1, c1 = decoder_model.predict(
            [target_seq] + states_value) #######NOTICE THE ADDITIONAL HIDDEN STATES

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c, h1, c1]#######NOTICE THE ADDITIONAL HIDDEN STATES

    return decoded_sentence


for seq_index in range(100):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Target sentence:', target_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

answered Sep 28 '22 03:09

Jeremy Wortz

Related questions
                            
                                What is a "cell class" in Keras?
                            
                                How to build TensorFlow Lite as a static library and link to it from a separate (CMake) project?
                            
                                How visualize attention LSTM using keras-self-attention package?
                            
                                TensorFlow: How to measure how much GPU memory each tensor takes?
                            
                                TensorFlow while-loop with TensorArray
                            
                                Which NVIDIA cuDNN release type for TensorFlow on Ubuntu 16.04 [closed]
                            
                                How can I use tf.data Datasets in eager execution mode?
                            
                                Tensorflow: I installed CUDA 9.2 but it needs 9.0?
                            
                                How to Merge Numerical and Embedding Sequential Models to treat categories in RNN
                            
                                tf.data vs keras.utils.sequence performance
                            
                                Tensorflow slicing based on variable
                            
                                In Tensorflow, what is the difference between a tensor that has a type ending in _ref and a tensor that does not?
                            
                                Training a Keras model from batches of .npy files using generator?
                            
                                Tensorflow error using my own data
                            
                                ValueError: Feature not in features dictionary
                            
                                Tensorflow feature column for variable list of values
                            
                                Integrating Keras model into TensorFlow
                            
                                Installing Tensorflow 1.10 on El Capitan 10.11.6
                            
                                Python tensorflow lite error：Cannot set tensor: Got tensor of type 1 but expected type 3 for input 88
                            
                                Unable to start TensorFlow within Docker, on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multilayer Seq2Seq model with LSTM in Keras

Tags:

tensorflow

keras

lstm

encoder-decoder

seq2seq