What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general. In the tensorflow <code>rnn</code> method, it is documented: <blockquote> If the <code>sequence_length</code> vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time), </blockquote> But in the <code>dynamic_rnn</code> method it mentions: <blockquote> The parameter <code>sequence_length</code> is optional and is used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance, unlike in <code>rnn()</code>. </blockquote> So does this mean <code>rnn</code> is more performant for variable length sequences? What is the conceptual difference between <code>dynamic_rnn</code> and <code>rnn</code>?

From the documentation I understand that what they are saying is that the parameter <code>sequence_length</code> in the <code>rnn</code> method affects the performance because when set, it will perform dynamic computation and it will stop before. For example, if the <code>rnn</code> largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the <code>sequence_length</code> for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if <code>sequence_length</code> is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence. This does not mean that <code>dynamic_rnn</code> is less performant, the documentation says that the parameter <code>sequence_length</code> will not affect the performance because the computation is already dynamic. Also according to this post about RNNs in Tensorflow: <blockquote> Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified. tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case. In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future. </blockquote> <code>dynamic_rnn</code> is even faster (or equal) so he suggests to use <code>dynamic_rnn</code> anyway.

To better understand dynamic unrolling, consider how you would create RNN from scratch, but using Tensorflow (I mean without using any RNN library) for 2 time stamp input <ol> <li>Create two placeholders, X1 and X2</li> <li>Create two variable weights, Wx and Wy, and bias</li> <li>Calculate output, Y1 = fn(X1 x Wx + b) and Y2 = fn(X2 x Wx + Y1 x Wy + b).</li> </ol> Its clear that we get two outputs, one for each timestamp. Keep in mind that Y2 indirectly depends on X2, via Y1. Now consider you have 50 timestamp of inputs, X1 through X50. In this case, you'll have to create 50 outputs, Y1 through Y50. This is what Tensorflow does by dynamic unrolling It creates these 50 outputs for you via tf.dynamic_rnn() units. I hope this helps.

What does it mean to unroll a RNN dynamically?

Tags:

neural-network

tensorflow

What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general.

In the tensorflow rnn method, it is documented:

If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time),

But in the dynamic_rnn method it mentions:

The parameter sequence_length is optional and is used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance, unlike in rnn().

So does this mean rnn is more performant for variable length sequences? What is the conceptual difference between dynamic_rnn and rnn?

398

asked Aug 14 '16 04:08

Xiv

Video Answer

3 Answers

From the documentation I understand that what they are saying is that the parameter sequence_length in the rnn method affects the performance because when set, it will perform dynamic computation and it will stop before.

For example, if the rnn largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the sequence_length for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if sequence_length is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence.

This does not mean that dynamic_rnn is less performant, the documentation says that the parameter sequence_length will not affect the performance because the computation is already dynamic.

Also according to this post about RNNs in Tensorflow:

Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.

tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.

In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.

dynamic_rnn is even faster (or equal) so he suggests to use dynamic_rnn anyway.

124

answered Oct 28 '22 16:10

Guillem Cucurull

To better understand dynamic unrolling, consider how you would create RNN from scratch, but using Tensorflow (I mean without using any RNN library) for 2 time stamp input

Create two placeholders, X1 and X2
Create two variable weights, Wx and Wy, and bias
Calculate output, Y1 = fn(X1 x Wx + b) and Y2 = fn(X2 x Wx + Y1 x Wy + b).

Its clear that we get two outputs, one for each timestamp. Keep in mind that Y2 indirectly depends on X2, via Y1.

Now consider you have 50 timestamp of inputs, X1 through X50. In this case, you'll have to create 50 outputs, Y1 through Y50. This is what Tensorflow does by dynamic unrolling It creates these 50 outputs for you via tf.dynamic_rnn() units.

I hope this helps.

answered Oct 28 '22 18:10

Ratnaraj

LSTM (or GRU) cell are the base of both.

Imagine an RNN as a stacked deep net with

weights sharing (=weights and biases matrices are the same in all layers)
input coming "from the side" into each layer
outputs are interpreted in higher layers (i.e. decoder), one in each layer

The depth of this net should depend on (actually be equal to) actual input and output lengths. And nothing else, as weights are the same in all the layers anyway.

Now, the classic way to build this is to group input-output-pairs into fixed max-lengths (i.e. model_with_buckets()). DynRNN breaks with this constraint and adapts to the actual sequence-lengths.

So no real trade-off here. Except maybe that you'll have to rewrite older code in order to adapt.

answered Oct 28 '22 17:10

Phillip Bock

Related questions
                            
                                Clearing Webpack cache
                            
                                Render blocking defer vs moving script at bottom
                            
                                what does VOLUME command do in Dockerfile?
                            
                                Ionic 3 - xcode error with cocoapods
                            
                                How to structure Machine Learning projects using Object Oriented programming in Python? [closed]
                            
                                Firebase auth onUpdate cloud function for when a user updates their email
                            
                                How to create a repeating animated moving gradient drawable, like an indeterminate progress?
                            
                                Deleting Apollo Client cache for a given query and every set of variables
                            
                                Why are iframe requests not sending cookies?
                            
                                ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running
                            
                                How to setup Axios interceptors with React Context properly?
                            
                                How to have an active binding know if it's called as a function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With