Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

swap_memory in dynamic_rnn allows quasi-infinite sequences?

I am trying to tag letters in long char-sequences. The inherent structure of the data requires me to use a bidirectional approach.

Furthermore based on this idea I need access to the hidden state at each timestep, not just the final one.

To try the idea I used a fixed length approach. I currently use batches of random pieces of say 60 characters each out of my much longer sequences and run my handmade bidirectional classifier with zero_state being the initial_state for each 60-letters-piece.

This worked fine, but obviously not perfectly, as in reality the sequences are longer and the information left and right from the piece I randomly cut from the original source is lost.

Now in order to advance I want to work with the entire sequences. They heavily vary in length though and there is no way I'll fit the entire sequences (batched furthermore) onto the GPU.

I found the swap_memory - parameter in the dynamic_rnn documentation. Would that help?

I didn't find any further documentation that helped me understand. And I cannot really try this out myself easily because I need access to the hidden states at each timestep thus I coded the current graph without using any of the higher level wrappers (such as dynamic_rnn). Trying this out would require me to get all the intermediate states out of the wrapper which as I understand is a lot of work to implement.

Before going through the hassle of trying this out I would love to be sure that this would indeed solve my memory issue. Thx for any hints!

like image 235
Phillip Bock Avatar asked May 18 '17 13:05

Phillip Bock


1 Answers

TL;DR: swap_memory won't let you work with pseudo-infinite sequences, but it will help you fit bigger (longer, or wider, or larger-batch) sequences in memory. There is a separate trick for pseudo-infinite sequences, but it only applies to unidirectional RNNs.


swap_memory

During training, a NN (including RNN) generally needs to save some activations in memory -- they are needed to calculate the gradient.

What swap_memory does is that it tells your RNN to store them in host (CPU) memory instead of the device (GPU) memory, and stream them back to the GPU by the time they are needed.

Effectively, this lets you pretend that your GPU has more memory than it actually does (at the expense of CPU memory, which tends to be more plentiful)

You still have to pay the computational cost of using very long sequences. Not to mention that you might run out of host memory.

To use it, simply give that argument the value True.


sequence_length

Use this parameter if your sequences are of different lengths. sequence_length has a misleading name - it's actually an array of sequence lengths.

You still need as much memory as you would have needed if all your sequences were of the same length (max_time parameter)


tf.nn.bidirectional_dynamic_rnn

TF includes a ready implementation of bidirectional RNNs, so it might be easier to use this instead of one's own.


Stateful RNNs

To deal with very long sequences when training unidirectional RNNs, people do something else: they save the final hidden states of every batch, and use them as the initial hidden state for the next batch (For this to work, the next batch has to be composed of the continuation of the previous batches' sequences)

These threads discuss how this can be done in TF:

TensorFlow: Remember LSTM state for next batch (stateful LSTM)

How do I set TensorFlow RNN state when state_is_tuple=True?

like image 64
MWB Avatar answered Oct 04 '22 18:10

MWB