When a computer has multiple CUDA-capable GPUs, each GPU is assigned a <code>device ID</code>. By default, CUDA kernels execute on <code>device ID 0</code>. You can use <code>cudaSetDevice(int device)</code> to select a different device. Let's say I have two GPUs in my machine: a GTX 480 and a GTX 670. How does CUDA decide which GPU is <code>device ID 0</code> and which GPU is <code>device ID 1</code>? <hr> Ideas for how CUDA might assign device IDs (just brainstorming): <ul> <li>descending order of compute capability</li> <li>PCI slot number</li> <li>date/time when the device was added to system (device that was just added to computer is higher ID number)</li> </ul> <hr> Motivation: I'm working on some HPC algorithms, and I'm benchmarking and autotuning them for several GPUs. My processor has enough PCIe lanes to drive cudaMemcpys to 3 GPUs at full bandwidth. So, instead of constantly swapping GPUs in and out of my machine, I'm planning to just keep 3 GPUs in my computer. I'd like to be able to predict what will happen when I add or replace some GPUs in the computer.

Set the environment variable <code>CUDA_DEVICE_ORDER</code> as: <pre class="prettyprint"><code>export CUDA_DEVICE_ORDER=PCI_BUS_ID </code></pre> Then the GPU IDs will be ordered by pci bus IDs.

How does CUDA assign device IDs to GPUs?

Tags:

cuda

gpgpu

gpu

nvidia

When a computer has multiple CUDA-capable GPUs, each GPU is assigned a device ID. By default, CUDA kernels execute on device ID 0. You can use cudaSetDevice(int device) to select a different device.

Let's say I have two GPUs in my machine: a GTX 480 and a GTX 670. How does CUDA decide which GPU is device ID 0 and which GPU is device ID 1?

Ideas for how CUDA might assign device IDs (just brainstorming):

descending order of compute capability
PCI slot number
date/time when the device was added to system (device that was just added to computer is higher ID number)

Motivation: I'm working on some HPC algorithms, and I'm benchmarking and autotuning them for several GPUs. My processor has enough PCIe lanes to drive cudaMemcpys to 3 GPUs at full bandwidth. So, instead of constantly swapping GPUs in and out of my machine, I'm planning to just keep 3 GPUs in my computer. I'd like to be able to predict what will happen when I add or replace some GPUs in the computer.

760

asked Dec 08 '12 20:12

solvingPuzzles

1 Answers

Set the environment variable CUDA_DEVICE_ORDER as:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

Then the GPU IDs will be ordered by pci bus IDs.

136

answered Sep 28 '22 02:09

Liang Xiao

Related questions
                            
                                CUDA List of atomic operations
                            
                                How do I start a new CUDA project in Visual Studio 2008?
                            
                                What is the difference between CUDA core and CPU core?
                            
                                Copying a struct containing pointers to CUDA device
                            
                                CUDA_HOME path for Tensorflow
                            
                                What is CUDA like? What is it for? What are the benefits? And how to start?
                            
                                Simplest Possible Example to Show GPU Outperform CPU Using CUDA
                            
                                "invalid configuration argument " error for the call of CUDA kernel?
                            
                                What is the correct version of CUDA for my nvidia driver?
                            
                                printf inside CUDA __global__ function
                            
                                function inside the cuda kernel
                            
                                Are cuda kernel calls synchronous or asynchronous
                            
                                cudaStreamSynchronize vs CudaDeviceSynchronize vs cudaThreadSynchronize
                            
                                CUDA driver version is insufficient for CUDA runtime version
                            
                                In a CUDA kernel, how do I store an array in "local thread memory"?
                            
                                CUDA apps time out & fail after several seconds - how to work around this?
                            
                                Revert Apple Clang Version For NVCC
                            
                                CMake Error: Variables are set to NOTFOUND
                            
                                What does #pragma unroll do exactly? Does it affect the number of threads?
                            
                                When is CUDA's __shared__ memory useful?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With