How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

Tags:

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.

288

asked May 14 '17 18:05

Negelis

3 Answers

You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.

answered Sep 20 '22 03:09

Carles Fenoy

Slurm stores this information in an environment variable, SLURM_JOB_GPUS.

One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:

set | grep SLURM | while read line; do echo "# $line"; done

answered Sep 19 '22 03:09

leilu

You can check the environment variables SLURM_STEP_GPUS or SLURM_JOB_GPUS for a given node:

echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}

Note CUDA_VISIBLE_DEVICES may not correspond to the real value (see @isarandi's comment).

Also, note this should work for non-Nvidia GPUs as well.

answered Sep 18 '22 03:09

bryant1410

Related questions
                            
                                Why does vulkan report a single device in a system with an integrated graphics card and a GPU?
                            
                                Is there a task manager equivalent that shows GPU usage history? [closed]
                            
                                physical memory on AMD devices: local vs private
                            
                                Can YUV -> RGB conversion be hardware accelerated?
                            
                                Detecting the GPU with a browser and javascript
                            
                                Nvidia GPU memory allocated but by no process?
                            
                                GPU 2D shared memory dynamic allocation
                            
                                How do GPUs handle random access?
                            
                                Sum image intensities in GPU
                            
                                Is it possible to have a persistent cuda kernel running and communicating with cpu asynchronously ?
                            
                                How to synchronize multithreaded OpenGL buffer access?
                            
                                Is it possible to emulate a GPU for CUDA/OpenCL unit testing purposes?
                            
                                How is GPU and memory utilization defined in nvidia-smi results?
                            
                                Fast rasterizing of text and vector art
                            
                                Python and gpu OpenCV functions
                            
                                Tensor Flow: Ran out of memory trying to allocate
                            
                                Anyone experienced the warning about Google colaboratory:You are connected to a GPU runtime, but not utilizing the GPU
                            
                                Would it be possible for a JIT compiler to utilize GPU for certain operations behind the scenes?
                            
                                How I reduce memory consumption in a loop in TensorFlow?
                            
                                Cannot import multi_gpu_model from keras.utils

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

Tags:

gpu

nvidia

slurm

sbatch