On the CIFAR-10 tutorial, I noticed that the variables are placed in CPU memory, but it is stated in cifar10-train.py
that it is trained with a single GPU.
I'm quite confused.. are the layer/activations stored in GPU? Or alternatively, are the gradients stored in the GPU? Otherwise, it would seem storing variables on CPU would not make use of the GPU at all - everything is stored in CPU memory, so only the CPU is used for forward/backward propagation.
If the GPU was used for f/b propagation, wouldn't that be a waste due to latency shuffling data CPU <-> GPU?
Indeed, in cifar10-train the activations and gradients are on GPU, only the parameters are on CPU. You are right that this is not optimal for single-GPU training due to the cost of copying parameters between CPU and GPU. I suspect the reason it is done this way is to have a single library for single-GPU and multi-GPU models, as in the multi-GPU case, it is probably faster to have parameters on CPU. You can test easily what speedup you can get by moving all variables to GPU, just remove the "with tf.device('/cpu:0')" in "_variable_on_cpu" in cifar10.py.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With