Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Slurm, is there a quick command to determine the total number of jobs (pending and active) at a given moment?

Tags:

linux

bash

slurm

In slurm, calling the command squeue -u <username> will list all the jobs that are pending or active for a given user. I am wondering if there was a quick way to tally them all so that I know how many outstanding jobs there are, including pending and actively running jobs. Thanks!

like image 505
user321627 Avatar asked Oct 29 '18 00:10

user321627


People also ask

How do I check my job status for Slurm?

You can see all jobs running under the account by running squeue -A account_name and then find out more information on each job by scontrol show job <jobid> .

What are Slurm commands?

The slurmd daemons provide fault-tolerant hierarchical communications. The user commands include: sacct, sacctmgr, salloc, sattach, sbatch, sbcast, scancel, scontrol, scrontab, sdiag, sh5util, sinfo, sprio, squeue, sreport, srun, sshare, sstat, strigger and sview. All of the commands can run anywhere in the cluster.

How many jobs can Slurm handle?

Please note that the hard maximum number of jobs that the SLURM scheduler can handle is 10000. It is best to limit your number of submitted jobs at any given time to less than half this amount in the case that another user also wants to submit a large number of jobs.

What is Sbatch command?

sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.


2 Answers

I would interprete "quick command" differently. Additionally I would add -r for cases when you are using job arrays:

squeue -u <username> -h -t pending,running -r | wc -l

option -h removes the header "wc -l" (word count) counts the line of the output. Eventually I am using it with watch

watch 'squeue -u <username> -h -t pending,running -r | wc -l'
like image 159
Stefan Maschek Avatar answered Sep 26 '22 20:09

Stefan Maschek


If you just want to summarize the output of squeue, how about:

squeue -u <username> | awk '
BEGIN {
    abbrev["R"]="(Running)"
    abbrev["PD"]="(Pending)"
    abbrev["CG"]="(Completing)"
    abbrev["F"]="(Failed)"
}
NR>1 {a[$5]++}
END {
    for (i in a) {
        printf "%-2s %-12s %d\n", i, abbrev[i], a[i]
    }
}'

which yields something like:

R  (Running)    1
PD (Pending)    4

Explanations:

  • The job state is assumed to be in the 5th field according to the default format of squeue.
  • Then the script counts the appearance of each job state code except for the 1st line which includes the header.
  • Finally it reports the count of each job state code.

In order to make it handy, add the following lines to your .bash_aliases or .bashrc (the filename may depend on the system):

function summary() {
    squeue "$@" | awk '
    BEGIN {
        abbrev["R"]="(Running)"
        abbrev["PD"]="(Pending)"
        abbrev["CG"]="(Completing)"
        abbrev["F"]="(Failed)"
    }
    NR>1 {a[$5]++}
    END {
        for (i in a) {
            printf "%-2s %-12s %d\n", i, abbrev[i], a[i]
        }
    }'
}

Then you can invoke the command just with summary [option], where [option] accepts options to squeue if needed (mostly unnecessary).

Hope this helps.

like image 36
tshiono Avatar answered Sep 26 '22 20:09

tshiono