I'm using TensorFlow v:1.1, and I would like to implement a sequence to sequence model using tf.contrib.seq2seq api.
However I have hard time understanding how to use all the functions (BasicDecoder, Dynamic_decode, Helper, Training Helper ...) provided to build my model.
Here is my setup: I would like to "translate" a sequence of feature vector: (batch_size, encoder_max_seq_len, feature_dim) into a sequence of a different length (batch_size, decoder_max_len, 1).
I already have the encoder that is an RNN with LSTM cell, and I get its final state that I would like to feed to the decoder as initial input.
I already have the cell for my decoder, MultiRNNCell LSM.
Could you help me building the last part using the functions of tf.contrib.seq2seq2 and dynamic_decode (an example code or explanations would be much appreciated) ?
Here is my code:
import tensorflow as tf
from tensorflow.contrib import seq2seq
from tensorflow.contrib import rnn
import math
from data import gen_sum_2b2
class Seq2SeqModel:
def __init__(self,
in_size,
out_size,
embed_size,
n_symbols,
cell_type,
n_units,
n_layers):
self.in_size = in_size
self.out_size = out_size
self.embed_size = embed_size
self.n_symbols = n_symbols
self.cell_type = cell_type
self.n_units = n_units
self.n_layers = n_layers
self.build_graph()
def build_graph(self):
self.init_placeholders()
self.init_cells()
self.encoder()
self.decoder_train()
self.loss()
self.training()
def init_placeholders(self):
with tf.name_scope('Placeholders'):
self.encoder_inputs = tf.placeholder(shape=(None, None, self.in_size),
dtype=tf.float32, name='encoder_inputs')
self.decoder_targets = tf.placeholder(shape=(None, None),
dtype=tf.int32, name='decoder_targets')
self.seqs_len = tf.placeholder(dtype=tf.int32)
self.batch_size = tf.placeholder(tf.int32, name='dynamic_batch_size')
self.max_len = tf.placeholder(tf.int32, name='dynamic_seq_len')
decoder_inputs = tf.reshape(self.decoder_targets, shape=(self.batch_size,
self.max_len, self.out_size))
self.decoder_inputs = tf.cast(decoder_inputs, tf.float32)
self.eos_step = tf.ones([self.batch_size, 1], dtype=tf.float32, name='EOS')
self.pad_step = tf.zeros([self.batch_size, 1], dtype=tf.float32, name='PAD')
def RNNCell(self):
c = self.cell_type(self.n_units, reuse=None)
c = rnn.MultiRNNCell([self.cell_type(self.n_units) for i in range(self.n_layers)])
return c
def init_cells(self):
with tf.variable_scope('RNN_enc_cell'):
self.encoder_cell = self.RNNCell()
with tf.variable_scope('RNN_dec_cell'):
self.decoder_cell = rnn.OutputProjectionWrapper(self.RNNCell(), self.n_symbols)
def encoder(self):
with tf.variable_scope('Encoder'):
self.init_state = self.encoder_cell.zero_state(self.batch_size, tf.float32)
_, self.encoder_final_state = tf.nn.dynamic_rnn(self.encoder_cell, self.encoder_inputs,
initial_state=self.init_state)
Decoding layer:
The decoding consists of two parts because of their differences during training
and inference
:
The decoder input at a particular time-step always comes from the output of the previous time-step. But during training, the output is fixed to the actual target (the actual target is fed back as input) and this has shown to improve performance.
Both these are handled using methods from tf.contrib.seq2seq
.
The main function for the decoder
is: seq2seq.dynamic decoder()
which performs dynamic decoding:
tf.contrib.seq2seq.dynamic_decode(decoder,maximum_iterations)
This takes a Decoder
instance and maximum_iterations=maximum seq length
as inputs.
1.1 The Decoder
instance is from:
seq2seq.BasicDecoder(cell, helper, initial_state,output_layer)
The inputs are: cell
(an RNNCell instance), helper
(helper instance), initial_state
(initial state of the decoder which should be the output state of the encoder) and output_layer
(an optional dense layer as outputs to makes predictions)
1.2 An RNNCell instance can be a rnn.MultiRNNCell()
.
1.3 The helper
instance is the one that differs in training
and inference
. During training
, we want the inputs to be fed to the decoder, while during inference
, we want the output of the decoder in time-step (t)
to be passed as the input to the decoder in time step (t+1)
.
For training: we use the helper function:
seq2seq.TrainingHelper(inputs, sequence_length)
, which just read inputs.
For inference: we call the helper function:
seq2seq.GreedyEmbeddingHelper() or seqseq.SampleEmbeddingHelper()
, which differs whether it to use argmax() or sampling(from a distribution)
of the outputs and passes the result through an embedding layer to get the next input.
Putting together: the Seq2Seq model
encoder layer
and passed it as a initial_state
to the decoder.decoder train
and decoder inference
using seq2seq.dynamic_decoder()
. When your calling both the methods make sure the weights are shared. (Use variable_scope
to reuse the weights)seq2seq.sequence_loss
.An example code is given here and here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With