Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignoring `CUDA_VISIBLE_DEVICES` environment variable

Since CUDA 3.1 it is possible to limit the list of GPUs visible to applicaion by setting CUDA_VISIBLE_DEVICES environment variable.

This affects both Runtime API and Driver API (to be sure I've checked it myself). It seems that device filtering is enforced somewher on driver level, and there is no way to ignore it.

However, I've encountered one closed source application which seems to somehow ignore this variable and always use device 0, even if we set CUDA_VISIBLE_DEVICES to empty string, which means that appliction should not see any CUDA-capable device.

The application in question uses same CUDA libraries as dummy application for counting available devices:

$ ldd a.out  # dummy
    linux-vdso.so.1 =>  (0x00007fff7ec60000)
    libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f606783a000)
    libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f60675e3000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f60672dd000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f606704e000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6066e37000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f6066aa7000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f606688b000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f6066674000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f6066470000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f6066268000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f6068232000)


$ ldd ../../bin/one.closed.source.application # application in question
    linux-vdso.so.1 =>  (0x00007fffcf99c000)
    libcufft.so.4 => /usr/local/cuda41/cuda/lib64/libcufft.so.4 (0x00007f06ce53a000)
    libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007f06cdb44000)
    libcudart.so.4 => /usr/local/cuda41/cuda/lib64/libcudart.so.4 (0x00007f06cd8ed000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f06cd6cb000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f06cd4c7000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f06cd2bf000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f06ccfb8000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f06ccd34000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f06ccb1e000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f06cc78d000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f06cc571000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f06d0110000)

I'm curious how is it possible to do this trick.

like image 675
aland Avatar asked Apr 18 '26 10:04

aland


1 Answers

Rubber duck debugging really works.

Turns out it is enough to use unsetenv before calling cuInit or cudaSetDevice, and the initial value of environmetal variable will be ignored.

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>

int main(int argc, char **argv, char **env) {
  int x;
  unsetenv("CUDA_VISIBLE_DEVICES");
  cuInit(0);
  // Now we see all the devices on machine
  cuDeviceGetCount(&x);
  printf("%d\n",x);
  return 0;
}
like image 153
aland Avatar answered Apr 21 '26 01:04

aland