Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean to unroll a RNN dynamically?

What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general.

In the tensorflow rnn method, it is documented:

If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time),

But in the dynamic_rnn method it mentions:

The parameter sequence_length is optional and is used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance, unlike in rnn().

So does this mean rnn is more performant for variable length sequences? What is the conceptual difference between dynamic_rnn and rnn?

like image 398
Xiv Avatar asked Aug 14 '16 04:08

Xiv


People also ask

What does it mean to unroll an RNN?

Unrolling Recurrent Neural Networks RNNs are fit and make predictions over many time steps. We can simplify the model by unfolding or unrolling the RNN graph over the input sequence. A useful way to visualise RNNs is to consider the update graph formed by 'unfolding' the network along the input sequence.

What is unrolled algorithm?

An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks.

What is drawbacks of RNNs?

Disadvantages Of RNNThe computation of this neural network is slow. Training can be difficult. If you are using the activation functions, then it becomes very tedious to process long sequences. It faces issues like Exploding or Gradient Vanishing.

Are RNNs sequential?

A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series data.


Video Answer


3 Answers

From the documentation I understand that what they are saying is that the parameter sequence_length in the rnn method affects the performance because when set, it will perform dynamic computation and it will stop before.

For example, if the rnn largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the sequence_length for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if sequence_length is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence.

This does not mean that dynamic_rnn is less performant, the documentation says that the parameter sequence_length will not affect the performance because the computation is already dynamic.

Also according to this post about RNNs in Tensorflow:

Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.

tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.

In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.

dynamic_rnn is even faster (or equal) so he suggests to use dynamic_rnn anyway.

like image 124
Guillem Cucurull Avatar answered Oct 28 '22 16:10

Guillem Cucurull


To better understand dynamic unrolling, consider how you would create RNN from scratch, but using Tensorflow (I mean without using any RNN library) for 2 time stamp input

  1. Create two placeholders, X1 and X2
  2. Create two variable weights, Wx and Wy, and bias
  3. Calculate output, Y1 = fn(X1 x Wx + b) and Y2 = fn(X2 x Wx + Y1 x Wy + b).

Its clear that we get two outputs, one for each timestamp. Keep in mind that Y2 indirectly depends on X2, via Y1.

Now consider you have 50 timestamp of inputs, X1 through X50. In this case, you'll have to create 50 outputs, Y1 through Y50. This is what Tensorflow does by dynamic unrolling It creates these 50 outputs for you via tf.dynamic_rnn() units.

I hope this helps.

like image 24
Ratnaraj Avatar answered Oct 28 '22 18:10

Ratnaraj


LSTM (or GRU) cell are the base of both.

Imagine an RNN as a stacked deep net with

  • weights sharing (=weights and biases matrices are the same in all layers)
  • input coming "from the side" into each layer
  • outputs are interpreted in higher layers (i.e. decoder), one in each layer

The depth of this net should depend on (actually be equal to) actual input and output lengths. And nothing else, as weights are the same in all the layers anyway.

Now, the classic way to build this is to group input-output-pairs into fixed max-lengths (i.e. model_with_buckets()). DynRNN breaks with this constraint and adapts to the actual sequence-lengths.

So no real trade-off here. Except maybe that you'll have to rewrite older code in order to adapt.

like image 1
Phillip Bock Avatar answered Oct 28 '22 17:10

Phillip Bock