Understanding Seq2Seq model

Tags:

Here is my understanding of a basic Sequence to Sequence LSTMs. Suppose we are tackling a question-answer setting.

You have two set of LSTMs (green and blue below). Each set respectively sharing weights (i.e. each of the 4 green cells have the same weights and similarly with the blue cells). The first is a many to one LSTM, which summarises the question at the last hidden layer/ cell memory.

The second set (blue) is a Many to Many LSTM which has different weights to the first set of LSTMs. The input is simply the answer sentence while the output is the same sentence shifted by one.

The question is two fold: 1. Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory. 2. Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

(image taken from suriyadeepan.github.io)

284

asked Sep 22 '17 01:09

sachinruk

1 Answers

Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory.

Both hidden state h and cell memory c are passed to the decoder.

TensorFlow

In seq2seq source code, you can find the following code in basic_rnn_seq2seq():

_, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
return rnn_decoder(decoder_inputs, enc_state, cell)

If you use an LSTMCell, the returned enc_state from the encoder will be a tuple (c, h). As you can see, the tuple is passed directly to the decoder.

Keras

In Keras, the "state" defined for an LSTMCell is also a tuple (h, c) (note that the order is different from TF). In LSTMCell.call(), you can find:

    h_tm1 = states[0]
    c_tm1 = states[1]

To get the states returned from an LSTM layer, you can specify return_state=True. The returned value is a tuple (o, h, c). The tensor o is the output of this layer, which will be equal to h unless you specify return_sequences=True.

Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

###TensorFlow### Just provide the initial state to an LSTMCell when calling it. For example, in the official RNN tutorial:

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
...
    output, state = lstm(current_batch_of_words, state)

There's also an initial_state argument for functions such as tf.nn.static_rnn. If you use the seq2seq module, provide the states to rnn_decoder as have been shown in the code for question 1.

###Keras###

Use the keyword argument initial_state in the LSTM function call.

out = LSTM(32)(input_tensor, initial_state=(h, c))

You can actually find this usage on the official documentation:

###Note on specifying the initial state of RNNs###

You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.

EDIT:

There's now an example script in Keras (lstm_seq2seq.py) showing how to implement basic seq2seq in Keras. How to make prediction after training a seq2seq model is also covered in this script.

198

answered Sep 26 '22 23:09

Yu-Yang

Related questions
                            
                                ValueError: Input 0 is incompatible with layer model: expected shape=(None, 14999, 7), found shape=(None, 7)
                            
                                Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU
                            
                                tensorflow loss minimization type error
                            
                                Selectively zero weights in TensorFlow?
                            
                                How do I update elements of a tensor using indices?
                            
                                reading data in tensorflow - TypeError("%s that don't all match." % prefix)
                            
                                How can I make use of intel-mkl with tensorflow
                            
                                Extract patches from 3D Matrix
                            
                                Range of size of tensor's dimension - tf.range
                            
                                Limit neural network output to subset of trained classes
                            
                                ValueError: Error when checking target: expected dense_2 to have shape (None, 2) but got array with shape (1, 1)
                            
                                Multi GPU/Tower setup Tensorflow 1.2 Estimator
                            
                                TensorFlow: tf.layers vs low-level API
                            
                                ValueError: Dimensions must be equal, but are 784 and 500 for 'MatMul_1' (op: 'MatMul') with input shapes: [?,784], [500,500]
                            
                                AttributeError:'list' object has no attribute 'size'
                            
                                How to properly serve an object detection model from Tensorflow Object Detection API?
                            
                                Do I need to use one_hot encoding if my output variable is binary?
                            
                                Tensorboard scalar plotting with epoch number on the horizontal axis
                            
                                keras combining two losses with adjustable weights
                            
                                my picture after using tf.image.resize_images becomes horrible picture

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Seq2Seq model

Tags:

tensorflow

keras

lstm