Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to submit a job to any [subset] of nodes from nodelist in SLURM?

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.

Currently I submit each of the jobs as follow:

sbatch --nodelist=myCluster[10-16] myScript.sh

However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.

What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?

like image 324
Faber Avatar asked Oct 06 '14 12:10

Faber


People also ask

How do I submit a job to Slurm?

There are two ways of submitting a job to SLURM: Submit via a SLURM job script - create a bash script that includes directives to the SLURM scheduler. Submit via command-line options - provide directives to SLURM via command-line arguments.

Which Slurm command is used to submit a batch job?

Use a batch job to recieve an allocation of compute resources and have your commands run there. sbatch is the slurm function to submit a script or . slurm submission script as a batch job. Here is a simple example, submitting a bash script as a batch job.

How many jobs can Slurm handle?

Please note that the hard maximum number of jobs that the SLURM scheduler can handle is 10000. It is best to limit your number of submitted jobs at any given time to less than half this amount in the case that another user also wants to submit a large number of jobs.

What is Ntasks per node?

--ntasks-per-node=<ntasks> - Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option.


Video Answer


2 Answers

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:

sbatch --exclude=myCluster[01-09] myScript.sh

and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.

like image 58
damienfrancois Avatar answered Oct 13 '22 08:10

damienfrancois


Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded.

I understand that you want the single-threaded jobs to share a node, whereas the parallel ones should be assigned a whole node exclusively?

multiple jobs should run at the same time on a single node.

As far as my understanding of SLURM goes, this implies that you must define CPU cores as consumable resources (i.e., SelectType=select/cons_res and SelectTypeParameters=CR_Core in slurm.conf)

Then, to constrain parallel jobs to get a whole node you can either use --exclusive option (but note that partition configuration takes precedence: you can't have shared nodes if the partition is configured for exclusive access), or use -N 1 --tasks-per-node="number_of_cores_in_a_node" (e.g., -N 1 --ntasks-per-node=8).

Note that the latter will only work if all nodes have the same number of cores.

None of the tasks should spawn over multiple nodes.

This should be guaranteed by -N 1.

like image 26
Riccardo Murri Avatar answered Oct 13 '22 07:10

Riccardo Murri