I've been using SLURM to request specific GPUs, like so;
--gres=gpu:TYPE:1
On the cluster I'm using there are 4 different GPUs available, all with their specific gres types.
For some jobs I don't care which GPU is used, so I can specify:
--gres=gpu:1
However, sometimes I'd like to have some specific types, but among those dont really care about which one. Basically the first one that is available.
So I would hope to specify something like:
--gres=gpu:TYPE1:1 OR --gres=gpu:TYPE2:1
So that it would pick whichever is available first.
However, I've been unable to find such an option. This does option exist SLURM?
Slurm supports the ability to define and schedule arbitrary Generic RESources (GRES). Additional built-in features are enabled for specific GRES types, including Graphics Processing Units (GPUs), CUDA Multi-Process Service (MPS) devices, and Sharding through an extensible plugin mechanism.
There are two ways to allocate GPUs in Slurm: either the general --gres=gpu:N parameter, or the specific parameters like --gpus-per-task=N . There are also two ways to launch MPI tasks in a batch script: either using srun , or using the usual mpirun (when OpenMPI is compiled with Slurm support).
You need to use -w node0xx or --nodelist=node0xx . You need to provide the partition too lest you want to get a "requested node not in this partition" error as some nodes can be in several partitions (in my case we have a node that's in the fat and the fat_short partitions).
$SLURM_SUBMIT_DIR is a variable holding the directory from whcih sbatch was run. Next we actually run the programme. We redirect the GMIN output to a log file. If you do not do this redirection, output will instead go to the slurm-<job_ID>. out file, which most likely resides on /sharedscratch/.
gres.conf - Slurm configuration file for Generic RESource (GRES) management. gres.conf is an ASCII file which describes the configuration of Generic RESource (GRES) on each compute node. If the GRES information in the slurm.conf file does not fully describe those resources, then a gres.conf file should be included on each compute node.
This removes the need to explicitly configure GPUs in gres.conf, though the Gres= line in slurm.conf is still required in order to tell slurmctld how many GRES to expect. By default, all system-detected devices are added to the node.
So in your slurm.conf and gres.conf, the GPU Type can be set to geforce, rtx, 2060, geforce_rtx_2060, or any other substring, and slurmd should be able to match it to the system-detected device geforce_rtx_2060 . Jobs will not be allocated any generic resources unless specifically requested at job submit time using the options:
Slurm supports no generic resources in the default configuration. One must explicitly specify which resources are to be managed in the slurm.conf configuration file. The configuration parameters of interest are GresTypes and Gres . For more details, see GresTypes and Gres in the slurm.conf man page.
Contrarily to the --constraint
option, the --gres
option does not allow logical constructs. One option would be to submit two jobs and scancel
the one that starts later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With