Variables on CPU, training/gradients on GPU

Question

On the CIFAR-10 tutorial, I noticed that the variables are placed in CPU memory, but it is stated in cifar10-train.py that it is trained with a single GPU.

I'm quite confused.. are the layer/activations stored in GPU? Or alternatively, are the gradients stored in the GPU? Otherwise, it would seem storing variables on CPU would not make use of the GPU at all - everything is stored in CPU memory, so only the CPU is used for forward/backward propagation.

If the GPU was used for f/b propagation, wouldn't that be a waste due to latency shuffling data CPU <-> GPU?

Lukasz Kaiser · Accepted Answer

Indeed, in cifar10-train the activations and gradients are on GPU, only the parameters are on CPU. You are right that this is not optimal for single-GPU training due to the cost of copying parameters between CPU and GPU. I suspect the reason it is done this way is to have a single library for single-GPU and multi-GPU models, as in the multi-GPU case, it is probably faster to have parameters on CPU. You can test easily what speedup you can get by moving all variables to GPU, just remove the "with tf.device('/cpu:0')" in "_variable_on_cpu" in cifar10.py.

Variables on CPU, training/gradients on GPU

Tags:

tensorflow

richizy

1 Answers

Lukasz Kaiser

Recent Activity

Donate For Us

Variables on CPU, training/gradients on GPU

Tags:

tensorflow

richizy

1 Answers

Lukasz Kaiser

Related questions

Recent Activity

Donate For Us