Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow Sequence to sequence model using the seq2seq API ( ver 1.1 and above)

I'm using TensorFlow v:1.1, and I would like to implement a sequence to sequence model using tf.contrib.seq2seq api. However I have hard time understanding how to use all the functions (BasicDecoder, Dynamic_decode, Helper, Training Helper ...) provided to build my model.

Here is my setup: I would like to "translate" a sequence of feature vector: (batch_size, encoder_max_seq_len, feature_dim) into a sequence of a different length (batch_size, decoder_max_len, 1).

I already have the encoder that is an RNN with LSTM cell, and I get its final state that I would like to feed to the decoder as initial input. I already have the cell for my decoder, MultiRNNCell LSM. Could you help me building the last part using the functions of tf.contrib.seq2seq2 and dynamic_decode (an example code or explanations would be much appreciated) ?

Here is my code:

import tensorflow as tf
from tensorflow.contrib import seq2seq
from tensorflow.contrib import rnn
import math

from data import gen_sum_2b2

class Seq2SeqModel:
def __init__(self,
             in_size,
             out_size,
             embed_size,
             n_symbols,
             cell_type,
             n_units,
             n_layers):
    self.in_size = in_size
    self.out_size = out_size
    self.embed_size = embed_size
    self.n_symbols = n_symbols
    self.cell_type = cell_type
    self.n_units = n_units
    self.n_layers = n_layers

    self.build_graph()

def build_graph(self):
    self.init_placeholders()
    self.init_cells()
    self.encoder()
    self.decoder_train()
    self.loss()
    self.training()

def init_placeholders(self):
    with tf.name_scope('Placeholders'):
        self.encoder_inputs = tf.placeholder(shape=(None, None, self.in_size), 
                                             dtype=tf.float32, name='encoder_inputs')
        self.decoder_targets = tf.placeholder(shape=(None, None),
                                              dtype=tf.int32, name='decoder_targets')
        self.seqs_len = tf.placeholder(dtype=tf.int32)
        self.batch_size = tf.placeholder(tf.int32, name='dynamic_batch_size')
        self.max_len = tf.placeholder(tf.int32, name='dynamic_seq_len')
        decoder_inputs = tf.reshape(self.decoder_targets, shape=(self.batch_size,
                                    self.max_len, self.out_size))
        self.decoder_inputs = tf.cast(decoder_inputs, tf.float32)
        self.eos_step = tf.ones([self.batch_size, 1], dtype=tf.float32, name='EOS')
        self.pad_step = tf.zeros([self.batch_size, 1], dtype=tf.float32, name='PAD')

def RNNCell(self):
    c = self.cell_type(self.n_units, reuse=None)
    c = rnn.MultiRNNCell([self.cell_type(self.n_units) for i in range(self.n_layers)])
    return c

def init_cells(self):
    with tf.variable_scope('RNN_enc_cell'):
        self.encoder_cell = self.RNNCell()  
    with tf.variable_scope('RNN_dec_cell'):
        self.decoder_cell = rnn.OutputProjectionWrapper(self.RNNCell(), self.n_symbols)

def encoder(self):
    with tf.variable_scope('Encoder'):
        self.init_state = self.encoder_cell.zero_state(self.batch_size, tf.float32) 
        _, self.encoder_final_state = tf.nn.dynamic_rnn(self.encoder_cell, self.encoder_inputs,
                                                        initial_state=self.init_state) 
like image 659
JimZer Avatar asked Apr 25 '17 23:04

JimZer


1 Answers

Decoding layer:

The decoding consists of two parts because of their differences during training and inference:

The decoder input at a particular time-step always comes from the output of the previous time-step. But during training, the output is fixed to the actual target (the actual target is fed back as input) and this has shown to improve performance.

Both these are handled using methods from tf.contrib.seq2seq.

  1. The main function for the decoder is: seq2seq.dynamic decoder() which performs dynamic decoding:

    tf.contrib.seq2seq.dynamic_decode(decoder,maximum_iterations)

    This takes a Decoder instance and maximum_iterations=maximum seq length as inputs.

    1.1 The Decoder instance is from:

    seq2seq.BasicDecoder(cell, helper, initial_state,output_layer)

    The inputs are: cell (an RNNCell instance), helper (helper instance), initial_state (initial state of the decoder which should be the output state of the encoder) and output_layer (an optional dense layer as outputs to makes predictions)

    1.2 An RNNCell instance can be a rnn.MultiRNNCell().

    1.3 The helper instance is the one that differs in training and inference. During training, we want the inputs to be fed to the decoder, while during inference, we want the output of the decoder in time-step (t) to be passed as the input to the decoder in time step (t+1).

    For training: we use the helper function: seq2seq.TrainingHelper(inputs, sequence_length), which just read inputs.

    For inference: we call the helper function: seq2seq.GreedyEmbeddingHelper() or seqseq.SampleEmbeddingHelper(), which differs whether it to use argmax() or sampling(from a distribution) of the outputs and passes the result through an embedding layer to get the next input.

Putting together: the Seq2Seq model

  1. Get the encoder state from the encoder layer and passed it as a initial_state to the decoder.
  2. Get the outputs of decoder train and decoder inference using seq2seq.dynamic_decoder(). When your calling both the methods make sure the weights are shared. (Use variable_scope to reuse the weights)
  3. Then train the network using the loss function seq2seq.sequence_loss.

An example code is given here and here.

like image 132
vijay m Avatar answered Oct 23 '22 18:10

vijay m