SLURM Submit multiple tasks per node?

I found some very similar questions which helped me arrive at a script which seems to work however I'm still unsure if I fully understand why, hence this question..

My problem (example): On 3 nodes, I want to run 12 tasks on each node (so 36 tasks in total). Also each task uses OpenMP and should use 2 CPUs. In my case a node has 24 CPUs and 64GB memory. My script would be:

#SBATCH --nodes=3
#SBATCH --ntasks=36
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=2000

export OMP_NUM_THREADS=2

for i in {1..36}; do
    srun -N 1 -n 1 ./program input${i} >& out${i} &
done

wait

This seems to work as I require, successively running tasks on a node until all CPUs on that node are in use, and then continuing to run further tasks on the next node until all CPUs are used again, etc..

My question.. I'm not sure if this is actually what it does (?) as I didn't fully understand the man page of srun regarding -n, and i have not used srun before. Mainly my confusion comes from "-n": In the man page for -n it says "The default is one task per node, ..", so I expected if I use "srun -n 1" that only one task will be run on each node, which doesn't seem to be the case. Furthermore when i tried e.g. "srun -n 2 ./program" it seems to just run the exact same program twice as two different tasks with no way to use different input files.. which I can't think of why that would ever be useful?

How do I submit multiple jobs in Slurm?

Slurm wrapThe wrap feature of sbatch can be used to submit multiple jobs at once. Sbatch will wrap the specified command string in a simple "sh" shell script, and submit that script to the slurm controller.

How do you submit jobs on Slurm?

There are two ways of submitting a job to SLURM: Submit via a SLURM job script - create a bash script that includes directives to the SLURM scheduler. Submit via command-line options - provide directives to SLURM via command-line arguments.

Your setup is correct except that you must use the --exclusive option of srun (which has a different meaning in this case than for sbatch).

As for your remark regarding the usefulness of srun, the behaviour of the program can be changed based on the environment variable $SLURM_TASK_ID, or the rank in case of an MPI program. Your confusion arises from the fact that your program is not written to be parallel (appart from the 2 OMP threads) while srun is meant to start parallel programs, most of the time based on MPI.

An other way is to run all your tasks at once. since the input and output file depends on the rank, a wrapper is needed

your SLURM script would be

#SBATCH --nodes=3
#SBATCH --ntasks=36
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=2000

export OMP_NUM_THREADS=2

srun -n 36 ./program.sh

and your wrapper program.sh would be

#!/bin/sh

exec ./program input${SLURM_PROCID} > out${SLURM_PROCID} 2>&1

SLURM Submit multiple tasks per node?

Tags:

hpc

slurm

sbatch

job-scheduling

Shiwayari

People also ask

2 Answers

damienfrancois

Gilles Gouaillardet

Recent Activity

Donate For Us

SLURM Submit multiple tasks per node?

Tags:

hpc

slurm

sbatch

job-scheduling

Shiwayari

People also ask

2 Answers

damienfrancois

Gilles Gouaillardet

Related questions

Recent Activity

Donate For Us