Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specify multiple GRES type options in SLURM

Tags:

slurm

I've been using SLURM to request specific GPUs, like so;

--gres=gpu:TYPE:1

On the cluster I'm using there are 4 different GPUs available, all with their specific gres types.

For some jobs I don't care which GPU is used, so I can specify:

--gres=gpu:1

However, sometimes I'd like to have some specific types, but among those dont really care about which one. Basically the first one that is available.

So I would hope to specify something like:

--gres=gpu:TYPE1:1 OR --gres=gpu:TYPE2:1

So that it would pick whichever is available first.

However, I've been unable to find such an option. This does option exist SLURM?

like image 915
DGIB Avatar asked Jul 16 '19 08:07

DGIB


People also ask

What is gres in Slurm?

Slurm supports the ability to define and schedule arbitrary Generic RESources (GRES). Additional built-in features are enabled for specific GRES types, including Graphics Processing Units (GPUs), CUDA Multi-Process Service (MPS) devices, and Sharding through an extensible plugin mechanism.

How do I specify GPU in Slurm?

There are two ways to allocate GPUs in Slurm: either the general --gres=gpu:N parameter, or the specific parameters like --gpus-per-task=N . There are also two ways to launch MPI tasks in a batch script: either using srun , or using the usual mpirun (when OpenMPI is compiled with Slurm support).

How do you specify nodes in Slurm?

You need to use -w node0xx or --nodelist=node0xx . You need to provide the partition too lest you want to get a "requested node not in this partition" error as some nodes can be in several partitions (in my case we have a node that's in the fat and the fat_short partitions).

What is Slurm_submit_dir?

$SLURM_SUBMIT_DIR is a variable holding the directory from whcih sbatch was run. Next we actually run the programme. We redirect the GMIN output to a log file. If you do not do this redirection, output will instead go to the slurm-<job_ID>. out file, which most likely resides on /sharedscratch/.

What is Gres configuration in Slurm?

gres.conf - Slurm configuration file for Generic RESource (GRES) management. gres.conf is an ASCII file which describes the configuration of Generic RESource (GRES) on each compute node. If the GRES information in the slurm.conf file does not fully describe those resources, then a gres.conf file should be included on each compute node.

Do I need GRES in slurmctld?

This removes the need to explicitly configure GPUs in gres.conf, though the Gres= line in slurm.conf is still required in order to tell slurmctld how many GRES to expect. By default, all system-detected devices are added to the node.

How do I assign a GPU type to a Slurm?

So in your slurm.conf and gres.conf, the GPU Type can be set to geforce, rtx, 2060, geforce_rtx_2060, or any other substring, and slurmd should be able to match it to the system-detected device geforce_rtx_2060 . Jobs will not be allocated any generic resources unless specifically requested at job submit time using the options:

How do I manage generic resources in Slurm?

Slurm supports no generic resources in the default configuration. One must explicitly specify which resources are to be managed in the slurm.conf configuration file. The configuration parameters of interest are GresTypes and Gres . For more details, see GresTypes and Gres in the slurm.conf man page.


1 Answers

Contrarily to the --constraint option, the --gres option does not allow logical constructs. One option would be to submit two jobs and scancel the one that starts later.

like image 75
damienfrancois Avatar answered Oct 06 '22 03:10

damienfrancois