In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?
As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples
then ran several instances of the nbody
simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using watch -n 1 nvidia-dmi
). Checking CUDA_VISIBLE_DEVICES
using
echo $CUDA_VISIBLE_DEVICES
I found this was not set. I tried setting it using
CUDA_VISIBLE_DEVICES=1
then running nbody
again but it also went to GPU 0.
I looked at the related question, how to choose designated GPU to run CUDA program?, but deviceQuery
command is not in the CUDA 8.0 bin directory. In addition to $CUDA_VISIBLE_DEVICES$
, I saw other posts refer to the environment variable $CUDA_DEVICES
but these were not set and I did not find information on how to use it.
While not directly related to my question, using nbody -device=1
I was able to get the application to run on GPU 1 but using nbody -numdevices=2
did not run on both GPU 0 and 1.
I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44.
I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?
Setting CUDA_VISIBLE_DEVICES=1 mean your script will only see one GPU which is GPU1. However, inside your script it will be cuda:0 and not cuda:1. Because it only see one GPU and its index start at 0. For example if you do: CUDA_VISIBLE_DEVICES=2,4,5, your script will see 3 GPUs with index 0, 1 and 2. 2 Likes.
The problem was caused by not setting the CUDA_VISIBLE_DEVICES
variable within the shell correctly.
To specify CUDA device 1
for example, you would set the CUDA_VISIBLE_DEVICES
using
export CUDA_VISIBLE_DEVICES=1
or
CUDA_VISIBLE_DEVICES=1 ./cuda_executable
The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.
If you want to specify more than one device, use
export CUDA_VISIBLE_DEVICES=0,1
or
CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable
In case of someone else is doing it in Python and it is not working, try to set it before do the imports of pycuda and tensorflow.
I.e.:
import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0" ... import pycuda.autoinit import tensorflow as tf ...
As saw here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With