When using bsub
with LSF, the -o
option gave a lot of details such as when the job started and ended and how much memory and CPU time the job took. With SLURM, all I get is the same standard output that I'd get from running a script without LSF.
For example, given this Perl 6 script:
warn "standard error stream";
say "standard output stream";
Submitted thus:
sbatch -o test.o%j -e test.e%j -J test_warn --wrap 'perl6 test.p6'
Resulted in the file test.o34380
:
Testing standard output
and the file test.e34380
:
Testing standard Error in block <unit> at test.p6:2
With LSF, I'd get all kinds of details in the standard output file, something like:
Sender: LSF System <lsfadmin@my_node>
Subject: Job 347511: <test> Done
Job <test> was submitted from host <my_cluster> by user <username> in cluster <my_cluster_act>.
Job was executed on host(s) <my_node>, in queue <normal>, as user <username> in cluster <my_cluster_act>.
</home/username> was used as the home directory.
</path/to/working/directory> was used as the working directory.
Started at Mon Mar 16 13:10:23 2015
Results reported at Mon Mar 16 13:10:29 2015
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
perl6 test.p6
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.19 sec.
Max Memory : 0.10 MB
Max Swap : 0.10 MB
Max Processes : 2
Max Threads : 3
The output (if any) follows:
standard output stream
PS:
Read file <test.e_347511> for stderr output of this job.
Update:
One or more -v
flags to sbatch
gives more preliminary information, but doesn't change the standard output.
For recent jobs, try
sacct -l
Look under the "Job Accounting Fields" section of the documentation for descriptions of each of the three dozen or so columns in the output.
For just the job ID, maximum RAM used, maximum virtual memory size, start time, end time, CPU time in seconds, and the list of nodes on which the jobs ran. By default this just gives info on jobs run the same day (see --starttime
or --endtime
options for getting info on jobs from other days):
sacct --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTimeRAW,NodeList
This will give you output like:
JobID MaxRSS MaxVMSize Start End CPUTimeRAW NodeList
------------ ------- ---------- ------------------- ------------------- ---------- --------
36511 2015-04-29T11:34:37 2015-04-29T11:34:37 0 c50b-20
36511.batch 660K 181988K 2015-04-29T11:34:37 2015-04-29T11:34:37 0 c50b-20
36514 2015-04-29T12:18:46 2015-04-29T12:18:46 0 c50b-20
36514.batch 656K 181988K 2015-04-29T12:18:46 2015-04-29T12:18:46 0 c50b-20
Use --state COMPLETED
for checking previously completed jobs. When checking a state other than RUNNING
, you have to give a start or end time.
sacct --starttime 08/01/15 --state COMPLETED --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTImeRaw,NodeList,ReqCPUS,ReqMem,Elapsed,Timelimit
You can also get work directory about the job using scontrol
:
scontrol show job 36514
Which will give you output like:
JobId=36537 JobName=sbatch
UserId=username(123456) GroupId=my_group(678)
......
WorkDir=/path/to/work/dir
However, by default, scontrol
can only access that information for about five minutes after the job finishes, after which it is purged from memory.
At the end of each job I use to insert
sstat -j $SLURM_JOB_ID.batch --format=JobID,MaxVMSize
to add RAM usage to the standard output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With