Can you tell me how cuda runtime chooses GPU device if 2 or more host threads use cuda runtime?
does the runtime choose separate GPU devices for each thread?
does GPU device needs to be set explicitly?
Thanks
Yes the GPU device needs to be set explicitly or the default one would be used (device 0 usually)
Keep in mind that once the runtime starts using one device all the functions called in the same thread will be pinned to that device.
Something I find useful upon starting a thread is
cudaThreadExit(); // clears all the runtime state for the current thread
cudaSetDevice(deviceId); // explicit set the current device for the other calls
cudaMalloc
cudaMemcpy
etc..
The programming guide has a chapter dedicated to it.
It depends on the mode in which GPUs are set.
Call nvidia-smi -q
to find the Compute Mode
of your GPU. Depending on the version of the CUDA framework you use, the output will be different.
Basically, default mode is set for GPUs. It allows several contexts to run alternatively on the same GPU. However, each context must explicitly release the GPU: while a context owns the GPU, the others are blocked for a short period, then killed after a timeout.
To bypass this limitation, you can call nvidia-smi -c
with one of this explicit value, depending on your needs:
Yes, GPU devices need to be set explicitly.
One simple strategy would consist of setting all the GPUs to EXCLUSIVE_THREAD
(as shown by jopasserat). A thread would iterate through all the available GPUs and try to pick up a free GPU until it succeeds.
The same mechanism would work fine in case of EXCLUSIVE_PROCESS
.
See 3.4 compute modes in the cuda toolkit documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With