I am building a dynamic RNN network with stacking multiple LSTMs. I see there are 2 options
# cells_fw and cells_bw are list of cells eg LSTM cells
stacked_cell_fw = tf.contrib.rnn.MultiRNNCell(cells_fw)
stacked_cell_bw = tf.contrib.rnn.MultiRNNCell(cells_bw)
output = tf.nn.bidirectional_dynamic_rnn(
stacked_cell_fw, stacked_cell_bw, INPUT,
sequence_length=LENGTHS, dtype=tf.float32)
vs
output = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(cells_fw, cells_bw, INPUT,
sequence_length=LENGTHS, dtype=tf.float32)
What is the difference between the 2 approaches and is one better than the other?
If you want to have have multiple layers that pass the information backward or forward in time, there are two ways how to design this. Assume the forward layer consists of two layers F1, F2 and the backword layer consists of two layers B1, B2.
If you use tf.nn.bidirectional_dynamic_rnn
the model will look like this (time flows from left to right):
If you use tf.contrib.rnn.stack_bidirectional_dynamic_rnn
the model will look like this:
Here the black dot between first and second layer represents a concatentation. I.e., the outputs of the forward and backward cells are concatenated together and fed to the backward and forward layers of the next upper layer. This means both F2 and B2 receive exactly the same input and there is an explicit connection between backward and forward layers. In "Speech Recognition with Deep Recurrent Neural Networks" Graves et al. summarize this as follows:
... every hidden layer receives input from both the forward and backward layers at the level below.
This connection only happens implicitly in the unstacked BiRNN (first image), namely when mapping back to the output. The stacked BiRNN usually performed better for my purposes, but I guess that depends on your problem setting. But for sure it is worthwile to try it out!
EDIT
In response to your comment: I base my answer on the documentation of the function tf.contrib.rnn.stack_bidirectional_dynamic_rnn
which says:
Stacks several bidirectional rnn layers. The combined forward and backward layer outputs are used as input of the next layer. tf.bidirectional_rnn does not allow to share forward and backward information between layers.
Also, I looked at the implementation available under this link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With