When I run my python script with CUDA_VISIBLE_DEVICES=2
, Tensorflow still shows the following:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:86:00.0)
Consequently, my code fails with the following message:
Could not satisfy explicit device specification '/device:GPU:2' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
Could someone please explain what must be going on?
This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system. In both cases, Tensorflow is not detecting your Nvidia GPU. This can be for a variety of reasons: Nvidia Driver not installed.
TensorFlow supports running computations on a variety of types of devices, including CPU and GPU.
CUDA_VISIBLE_DEVICES is used to specify which GPUs should be visible to a CUDA application. CUDA_VISIBLE_DEVICES_ORIG is a LSF internal environment variable.
To use it, set CUDA_VISIBLE_DEVICES to a comma-separated list of device IDs to make only those devices visible to the application. So, your code is valid. These CUDA APIs are much more low level way of controlling the GPU(s).
Citing the explanation of CUDA_VISIBLE_DEVICES:
CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2.
So if you do CUDA_VISIBLE_DEVICES=2
, then your gpu #2 will be denoted as gpu:0
inside tensorflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With