What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general.
In the tensorflow rnn
method, it is documented:
If the
sequence_length
vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time),
But in the dynamic_rnn
method it mentions:
The parameter
sequence_length
is optional and is used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance, unlike inrnn()
.
So does this mean rnn
is more performant for variable length sequences? What is the conceptual difference between dynamic_rnn
and rnn
?
Unrolling Recurrent Neural Networks RNNs are fit and make predictions over many time steps. We can simplify the model by unfolding or unrolling the RNN graph over the input sequence. A useful way to visualise RNNs is to consider the update graph formed by 'unfolding' the network along the input sequence.
An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks.
Disadvantages Of RNNThe computation of this neural network is slow. Training can be difficult. If you are using the activation functions, then it becomes very tedious to process long sequences. It faces issues like Exploding or Gradient Vanishing.
A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series data.
From the documentation I understand that what they are saying is that the parameter sequence_length
in the rnn
method affects the performance because when set, it will perform dynamic computation and it will stop before.
For example, if the rnn
largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the sequence_length
for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if sequence_length
is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence.
This does not mean that dynamic_rnn
is less performant, the documentation says that the parameter sequence_length
will not affect the performance because the computation is already dynamic.
Also according to this post about RNNs in Tensorflow:
Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.
tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.
In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.
dynamic_rnn
is even faster (or equal) so he suggests to use dynamic_rnn
anyway.
To better understand dynamic unrolling, consider how you would create RNN from scratch, but using Tensorflow (I mean without using any RNN library) for 2 time stamp input
Its clear that we get two outputs, one for each timestamp. Keep in mind that Y2 indirectly depends on X2, via Y1.
Now consider you have 50 timestamp of inputs, X1 through X50. In this case, you'll have to create 50 outputs, Y1 through Y50. This is what Tensorflow does by dynamic unrolling It creates these 50 outputs for you via tf.dynamic_rnn() units.
I hope this helps.
LSTM (or GRU) cell are the base of both.
Imagine an RNN as a stacked deep net with
The depth of this net should depend on (actually be equal to) actual input and output lengths. And nothing else, as weights are the same in all the layers anyway.
Now, the classic way to build this is to group input-output-pairs into fixed max-lengths (i.e. model_with_buckets()). DynRNN breaks with this constraint and adapts to the actual sequence-lengths.
So no real trade-off here. Except maybe that you'll have to rewrite older code in order to adapt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With