Since CUDA 3.1 it is possible to limit the list of GPUs visible to applicaion by setting CUDA_VISIBLE_DEVICES environment variable.
This affects both Runtime API and Driver API (to be sure I've checked it myself). It seems that device filtering is enforced somewher on driver level, and there is no way to ignore it.
However, I've encountered one closed source application which seems to somehow ignore this variable and always use device 0, even if we set CUDA_VISIBLE_DEVICES to empty string, which means that appliction should not see any CUDA-capable device.
The application in question uses same CUDA libraries as dummy application for counting available devices:
$ ldd a.out # dummy
linux-vdso.so.1 => (0x00007fff7ec60000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f606783a000)
libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f60675e3000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f60672dd000)
libm.so.6 => /lib64/libm.so.6 (0x00007f606704e000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6066e37000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6066aa7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f606688b000)
libz.so.1 => /lib64/libz.so.1 (0x00007f6066674000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6066470000)
librt.so.1 => /lib64/librt.so.1 (0x00007f6066268000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6068232000)
$ ldd ../../bin/one.closed.source.application # application in question
linux-vdso.so.1 => (0x00007fffcf99c000)
libcufft.so.4 => /usr/local/cuda41/cuda/lib64/libcufft.so.4 (0x00007f06ce53a000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f06cdb44000)
libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f06cd8ed000)
libz.so.1 => /lib64/libz.so.1 (0x00007f06cd6cb000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f06cd4c7000)
librt.so.1 => /lib64/librt.so.1 (0x00007f06cd2bf000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f06ccfb8000)
libm.so.6 => /lib64/libm.so.6 (0x00007f06ccd34000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f06ccb1e000)
libc.so.6 => /lib64/libc.so.6 (0x00007f06cc78d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06cc571000)
/lib64/ld-linux-x86-64.so.2 (0x00007f06d0110000)
I'm curious how is it possible to do this trick.
Rubber duck debugging really works.
Turns out it is enough to use unsetenv before calling cuInit or cudaSetDevice, and the initial value of environmetal variable will be ignored.
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
int main(int argc, char **argv, char **env) {
int x;
unsetenv("CUDA_VISIBLE_DEVICES");
cuInit(0);
// Now we see all the devices on machine
cuDeviceGetCount(&x);
printf("%d\n",x);
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With