parallel but different Slurm srun job step invocations not working

Tags:

slurm

I'd like to run the same program on a large number of different input files. I could just submit each as a separate Slurm submission, but I don't want to swamp the queue by dumping 1000s of jobs on it at once. I've been trying to figure out how to process the same number of files by instead creating an allocation first, then within that allocation looping over all the files with srun, giving each invocation a single core from the allocation. The problem is that no matter what I do, only one job step runs at a time. The simplest test case I could come up with is:

#!/usr/bin/env bash

srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &
srun --exclusive --ntasks 1 -c 1 sleep 1 &

wait

It doesn't matter how many cores I assign the allocation:

time salloc -n 1 test
time salloc -n 2 test
time salloc -n 4 test

it always takes 4 seconds. Is it not possible to have multiple job steps execute in parallel?

728

asked Feb 19 '16 06:02

Cyclone

1 Answers

It turned out to be that the default memory per cpu was not defined, so even single core jobs were running by reserving all the node's RAM.

Setting DefMemPerCPU, or specifying explicit RAM reservations did the trick.

143

answered Nov 15 '22 09:11

Cyclone

Related questions
                            
                                Why do jobs in slurm freeze indefinitely when they are TensorFlow scripts?
                            
                                Using Python multiprocessing in Slurm, and which combination of ntasks or ncpus I need
                            
                                Keep slurm array tasks confined in a single node
                            
                                How to configure batchscript to parallelize R script with future.batchtools (SLURM)
                            
                                Is it possible to run users' jobs in chroot environment using Slurm
                            
                                Snakemake slurm ouput file redirect to new directory
                            
                                automatically retrieve results of bsub
                            
                                Why do I keep getting NonZeroExitCode when using sbatch SLURM?
                            
                                Is it possible to pause currently running submission scripts in SLURM?
                            
                                Sort jobs by JOBID in Slurm
                            
                                Using SBATCH Job Name as a Variable in File Output
                            
                                Setting up slurm.conf file for single computer
                            
                                Prevent GPU usage in SLURM when --gpus is not set
                            
                                After submitting a .m batch job with Slurm, can I edit my .m file without changing my original submission?
                            
                                Multithreading on SLURM
                            
                                SLURM: How to view completed jobs full name?
                            
                                SLURM Submit multiple tasks per node?
                            
                                slurm: DependencyNeverSatisfied error even after crashed job re-queued
                            
                                kubernetes with slurm, is this correct setup?
                            
                                How to handle job cancelation in Slurm?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

parallel but different Slurm srun job step invocations not working

Tags:

slurm

Cyclone

People also ask

1 Answers

Cyclone

Recent Activity

Donate For Us