What's the difference between the two following parallelization schemes on Slurm?
Scheme 1
Run sbatch script.sh
#!/bin/bash
#SBATCH --ntasks=8
## more options
srun echo hello
This summons 8 jobs that run echo hello
Scheme 2 I've accomplished something similar using array jobs.
#!/bin/bash
#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out
#SBATCH --error=arrayJob_%A_%a.err
#SBATCH --array=1-8
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
# Print this sub-job's task ID
echo hello
Is there any difference between the two schemes? They both seem to accomplish the same thing.
Scheme 1 is one single job (with 8 tasks) while Scheme 2 is 8 distinct jobs (each with one task). In the first case, all the tasks will be scheduled at the same time, while in the second case, the 8 tasks will be scheduled independently one of another.
With the job array (Scheme 2), if 8 CPUs become available at once, they will all start at the same time, but if only 4 CPUs become available at first, 4 tasks will run, the other 4 remaining pending. When the initial 4 are done, the other 4 are started. It is typically used in the case of embarrassingly parallel jobs, where the processes do not need to communicate or synchronise, like for applying the same program to a list of files.
By contrast, with a single job (Scheme 1), Slurm will start the 8 tasks at the same time, so it will need 8 CPUS to become available at the same time. This is typically only used with parallel jobs where processes need to communicate with each others, for instance using an Message Passing Interface library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With