Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Training recurrent neural network when using GPU acceleration by TensorFlow

I have a basic knowledge of parallel computing (including some CUDA), feedforward neural networks, and recurrent neural networks (and how they use BPTT).

When using for example TensorFlow you can apply GPU acceleration for the training phase of a network. But recurrent neural networks are sequential in nature, having timesteps where a current timestep depends on a previous, and the next timestep depends on the current, etc.

How come GPU acceleration works if it is like this? Is everything that can be computed in parallel computed in that way, while the timestep dependent parts are serialized?

like image 433
Stephen Johnson Avatar asked Mar 10 '23 06:03

Stephen Johnson


2 Answers

RNNs train using backpropagation through time. The recurrent network structure is unfolded into a directed acyclic graph of finite length and looks just as a normal feedforward net would. It then trains using stochastic gradient descent where in between each time step there is a constraint that the weights must be equal.

If you understand that it trains like this, as in it is just constrained backpropagation on sequences of a given length, you see there is nothing about the sequential nature that is stopping this process from being parallelizable.

like image 150
convolutionBoy Avatar answered May 12 '23 18:05

convolutionBoy


The way you can get performance for GPU training of recurrent neural networks is by using a large enough batch size that computing the forward/backward pass for a single cell consumes enough compute to make the GPU busy.

like image 29
Alexandre Passos Avatar answered May 12 '23 19:05

Alexandre Passos