Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variables on CPU, training/gradients on GPU

Tags:

tensorflow

On the CIFAR-10 tutorial, I noticed that the variables are placed in CPU memory, but it is stated in cifar10-train.py that it is trained with a single GPU.

I'm quite confused.. are the layer/activations stored in GPU? Or alternatively, are the gradients stored in the GPU? Otherwise, it would seem storing variables on CPU would not make use of the GPU at all - everything is stored in CPU memory, so only the CPU is used for forward/backward propagation.

If the GPU was used for f/b propagation, wouldn't that be a waste due to latency shuffling data CPU <-> GPU?

like image 304
richizy Avatar asked Dec 23 '15 04:12

richizy


1 Answers

Indeed, in cifar10-train the activations and gradients are on GPU, only the parameters are on CPU. You are right that this is not optimal for single-GPU training due to the cost of copying parameters between CPU and GPU. I suspect the reason it is done this way is to have a single library for single-GPU and multi-GPU models, as in the multi-GPU case, it is probably faster to have parameters on CPU. You can test easily what speedup you can get by moving all variables to GPU, just remove the "with tf.device('/cpu:0')" in "_variable_on_cpu" in cifar10.py.

like image 60
Lukasz Kaiser Avatar answered Oct 07 '22 11:10

Lukasz Kaiser