In a sbatch script, you can directly launch programs or scripts (for example an executable file myapp
) but in many tutorials people use srun myapp
instead.
Despite reading some documentation on the topic, I do not understand the difference and when to use each of those syntaxes.
I hope this question is precise enough (1st question on SO), thanks in advance for your answers.
The main difference is that srun is interactive and blocking (you get the result in your terminal and you cannot write other commands until it is finished), while sbatch is batch processing and non-blocking (results are written to a file and you can submit other commands right away).
After typing your srun command and options on the command line and pressing enter, Slurm will find and then allocate the resources you specified. Depending on what you specified, it can take a few minutes for Slurm to allocate those resources. You can view all of the srun options on the Slurm documentation website.
srun is the command used to run a process on the compute nodes in the cluster. It works by passing it a command (this could be a script) which will be run on a compute node and then srun will return. srun accepts many command line options to specify the resources required by the command passed to it.
mpirun start proxy on each node, and then start the MPI tasks. On the other hand (e.g. the MPI tasks are not directly known by the resource manager). srun directly start the MPI tasks, but that requires some support ( PMI or PMIx ) from SLURM .
The srun
command is used to create job 'steps'.
First, it will bring better reporting of the resource usage ; the sstat command will provide real-time resource usage for processes that are started with srun
, and each step (each call to srun) will be reported individually in the accounting.
Second, it can be used to setup many instances of a serial program (program that only use one CPU) into a single job, and micro-schedule those programs inside the job allocation.
Finally, for parallel jobs, srun
will also play the important role of starting the parallel program and setup the parallel environment. It will start as many instances of the program as were requested with the --ntasks
option on the CPUs that were allocated for the job. In the case of a MPI program, it will also handle the communication between the MPI library and Slurm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With