How to access to GPUs on different nodes in a cluster with Slurm?

Question

I have access to a cluster that's run by Slurm, in which each node has 4 GPUs.

I have a code that needs 8 gpus.

So the question is how can I request 8 gpus on a cluster that each node has only 4 gpus?

So this is the job that I tried to submit via sbatch:

#!/bin/bash
#SBATCH --gres=gpu:8              
#SBATCH --nodes=2               
#SBATCH --mem=16000M              
#SBATCH --time=0-01:00

But then I get the following error:

sbatch: error: Batch job submission failed: Requested node configuration is not available

Then I changed my the settings to this and submitted again:

#!/bin/bash
#SBATCH --gres=gpu:4              
#SBATCH --nodes=2               
#SBATCH --mem=16000M              
#SBATCH --time=0-01:00  
nvidia-smi

and the result shows only 4 gpus not 8.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 0000:03:00.0     Off |                    0 |
| N/A   32C    P0    31W / 250W |      0MiB / 12193MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 0000:04:00.0     Off |                    0 |
| N/A   37C    P0    29W / 250W |      0MiB / 12193MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 0000:82:00.0     Off |                    0 |
| N/A   35C    P0    28W / 250W |      0MiB / 12193MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 0000:83:00.0     Off |                    0 |
| N/A   33C    P0    26W / 250W |      0MiB / 12193MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Thanks.

Bub Espinja · Accepted Answer

Slurm does not support what you need. It only can assign to your job GPUs/node, not GPUs/cluster. So, unlike CPUs or other consumable resources, GPUs are not consumable and are binded to the node where they are hosted.

If you are interested in this topic, there is a research effort to turn GPUs into consumable resources, check this paper. There you'll find how to do it using GPU virtualization technologies.

How to access to GPUs on different nodes in a cluster with Slurm?

Tags:

gpu

cluster-computing

slurm

Mehran

1 Answers

Bub Espinja

Recent Activity

Donate For Us

How to access to GPUs on different nodes in a cluster with Slurm?

Tags:

gpu

cluster-computing

slurm

Mehran

1 Answers

Bub Espinja

Related questions

Recent Activity

Donate For Us