I have a job running a linux machine managed by slurm. Now that the job is running for a few hours I realize that I underestimated the time required for it to finish and thus the value of the --time
argument I specified is not enough. Is there a way to add time to an existing running job through slurm?
You can see all jobs running under the account by running squeue -A account_name and then find out more information on each job by scontrol show job <jobid> . ReqNodeNotAvail - If you have requested a specific node and it is currently scheduled you can get this job code.
Please note that the hard maximum number of jobs that the SLURM scheduler can handle is 10000. It is best to limit your number of submitted jobs at any given time to less than half this amount in the case that another user also wants to submit a large number of jobs.
Job information Information on all running and pending batch jobs managed by SLURM can be obtained from the SLURM command squeue . Note that information on completed jobs is only retained for a limited period. Information on jobs that ran in the past is via. sacct An example of the output squeue is shown below.
Use the scontrol command to modify a job
scontrol update jobid=<job_id> TimeLimit=<new_timelimit>
Use the SLURM time format, eg. for 8 days 15 hours: TimeLimit=8-15:00:00
Requires admin privileges, on some machines.
Will be allowed to users only if the job is not running yet, on most machines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With