Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Seq2Seq model

Here is my understanding of a basic Sequence to Sequence LSTMs. Suppose we are tackling a question-answer setting.

You have two set of LSTMs (green and blue below). Each set respectively sharing weights (i.e. each of the 4 green cells have the same weights and similarly with the blue cells). The first is a many to one LSTM, which summarises the question at the last hidden layer/ cell memory.

The second set (blue) is a Many to Many LSTM which has different weights to the first set of LSTMs. The input is simply the answer sentence while the output is the same sentence shifted by one.

The question is two fold: 1. Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory. 2. Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

http://suriyadeepan.github.io/img/seq2seq/seq2seq2.png (image taken from suriyadeepan.github.io)

like image 284
sachinruk Avatar asked Sep 22 '17 01:09

sachinruk


People also ask

What is seq2seq in NLP?

Seq2seq is a family of machine learning approaches used for natural language processing. Applications include language translation, image captioning, conversational models and text summarization.

Is seq2seq a language model?

Seq2Seq, or Sequence To Sequence, is a model used in sequence prediction tasks, such as language modelling and machine translation.

Is seq2seq same as encoder decoder?

The Encoder-Decoder LSTM is a recurrent neural network designed to address sequence-to-sequence problems, sometimes called seq2seq. Sequence-to-sequence prediction problems are challenging because the number of items in the input and output sequences can vary.

How do encoder/decoder models work?

Encoder decoder models allow for a process in which a machine learning model generates a sentence describing an image. It receives the image as the input and outputs a sequence of words. This also works with videos.


1 Answers

  1. Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory.

Both hidden state h and cell memory c are passed to the decoder.

TensorFlow

In seq2seq source code, you can find the following code in basic_rnn_seq2seq():

_, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
return rnn_decoder(decoder_inputs, enc_state, cell)

If you use an LSTMCell, the returned enc_state from the encoder will be a tuple (c, h). As you can see, the tuple is passed directly to the decoder.

Keras

In Keras, the "state" defined for an LSTMCell is also a tuple (h, c) (note that the order is different from TF). In LSTMCell.call(), you can find:

    h_tm1 = states[0]
    c_tm1 = states[1]

To get the states returned from an LSTM layer, you can specify return_state=True. The returned value is a tuple (o, h, c). The tensor o is the output of this layer, which will be equal to h unless you specify return_sequences=True.

  1. Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

###TensorFlow### Just provide the initial state to an LSTMCell when calling it. For example, in the official RNN tutorial:

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
...
    output, state = lstm(current_batch_of_words, state)

There's also an initial_state argument for functions such as tf.nn.static_rnn. If you use the seq2seq module, provide the states to rnn_decoder as have been shown in the code for question 1.

###Keras###

Use the keyword argument initial_state in the LSTM function call.

out = LSTM(32)(input_tensor, initial_state=(h, c))

You can actually find this usage on the official documentation:

###Note on specifying the initial state of RNNs###

You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.


EDIT:

There's now an example script in Keras (lstm_seq2seq.py) showing how to implement basic seq2seq in Keras. How to make prediction after training a seq2seq model is also covered in this script.

like image 198
Yu-Yang Avatar answered Sep 26 '22 23:09

Yu-Yang