In a multi-GPU computer, how do I designate which GPU a CUDA job should run on? As an example, when installing CUDA, I opted to install the <code>NVIDIA_CUDA-<#.#>_Samples</code> then ran several instances of the <code>nbody</code> simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using <code>watch -n 1 nvidia-dmi</code>). Checking <code>CUDA_VISIBLE_DEVICES</code> using <pre class="prettyprint"><code>echo $CUDA_VISIBLE_DEVICES </code></pre> I found this was not set. I tried setting it using <pre class="prettyprint"><code>CUDA_VISIBLE_DEVICES=1 </code></pre> then running <code>nbody</code> again but it also went to GPU 0. I looked at the related question, how to choose designated GPU to run CUDA program?, but <code>deviceQuery</code> command is not in the CUDA 8.0 bin directory. In addition to <code>$CUDA_VISIBLE_DEVICES$</code>, I saw other posts refer to the environment variable <code>$CUDA_DEVICES</code> but these were not set and I did not find information on how to use it. While not directly related to my question, using <code>nbody -device=1</code> I was able to get the application to run on GPU 1 but using <code>nbody -numdevices=2</code> did not run on both GPU 0 and 1. I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44. I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?

The problem was caused by not setting the <code>CUDA_VISIBLE_DEVICES</code> variable within the shell correctly. To specify CUDA device <code>1</code> for example, you would set the <code>CUDA_VISIBLE_DEVICES</code> using <pre class="prettyprint"><code>export CUDA_VISIBLE_DEVICES=1 </code></pre> or <pre class="prettyprint"><code>CUDA_VISIBLE_DEVICES=1 ./cuda_executable </code></pre> The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. If you want to specify more than one device, use <pre class="prettyprint"><code>export CUDA_VISIBLE_DEVICES=0,1 </code></pre> or <pre class="prettyprint"><code>CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable </code></pre>

In case of someone else is doing it in Python and it is not working, try to set it before do the imports of pycuda and tensorflow. I.e.: <pre class="prettyprint"><code>import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0" ... import pycuda.autoinit import tensorflow as tf ... </code></pre> As saw here.

How do I select which GPU to run a job on?

Tags:

cuda

nvidia

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?

As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using watch -n 1 nvidia-dmi). Checking CUDA_VISIBLE_DEVICES using

echo $CUDA_VISIBLE_DEVICES

I found this was not set. I tried setting it using

CUDA_VISIBLE_DEVICES=1

then running nbody again but it also went to GPU 0.

I looked at the related question, how to choose designated GPU to run CUDA program?, but deviceQuery command is not in the CUDA 8.0 bin directory. In addition to $CUDA_VISIBLE_DEVICES$ , I saw other posts refer to the environment variable $CUDA_DEVICES but these were not set and I did not find information on how to use it.

While not directly related to my question, using nbody -device=1 I was able to get the application to run on GPU 1 but using nbody -numdevices=2 did not run on both GPU 0 and 1.

I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44.

I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?

733

asked Sep 22 '16 21:09

Steven C. Howell

2 Answers

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly.

To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using

export CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.

If you want to specify more than one device, use

export CUDA_VISIBLE_DEVICES=0,1

CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable

192

answered Oct 12 '22 07:10

Steven C. Howell

In case of someone else is doing it in Python and it is not working, try to set it before do the imports of pycuda and tensorflow.

I.e.:

import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0" ... import pycuda.autoinit import tensorflow as tf ...

As saw here.

answered Oct 12 '22 05:10

Lucas

Related questions
                            
                                Error compiling CUDA from Command Prompt
                            
                                How and when should I use pitched pointer with the cuda API?
                            
                                Does __syncthreads() synchronize all threads in the grid?
                            
                                Cuda gridDim and blockDim
                            
                                CUDA or FPGA for special purpose 3D graphics computations? [closed]
                            
                                Does CUDA support recursion?
                            
                                Coding CUDA with C#?
                            
                                CUDA determining threads per block, blocks per grid
                            
                                Error Message : Cannot find or open the PDB file
                            
                                How can I flush GPU memory using CUDA (physical reset is unavailable)
                            
                                GPU Programming, CUDA or OpenCL? [closed]
                            
                                When to call cudaDeviceSynchronize?
                            
                                Passing pointers between C and Java through JNI
                            
                                LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in file.obj
                            
                                In CUDA, what is memory coalescing, and how is it achieved?
                            
                                nvidia-smi Volatile GPU-Utilization explanation?
                            
                                Streaming multiprocessors, Blocks and Threads (CUDA)
                            
                                Why is CUDA pinned memory so fast?
                            
                                Is it possible to run CUDA on AMD GPUs?
                            
                                Best approach for GPGPU/CUDA/OpenCL in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I select which GPU to run a job on?

Tags:

cuda

nvidia

Steven C. Howell

People also ask

2 Answers

Steven C. Howell

Lucas

Recent Activity

Donate For Us