Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run a longer job in SLURM if the default time limit of partition is not sufficient?

Tags:

mpi

hpc

slurm

I have submitted my job in a linux-cluster(that uses SLURM to schedule job), but the time limit of each partition is only 24hr(actually this limit is set by the admin) and it seems that my code need to run more than a week(as per my guess). I am new to SLURM script and understand a very little about the interplay between the following:

#SBATCH --nodes=
#SBATCH --ntasks-per-node=
#SBATCH --ntasks=
#SBATCH --ntasks-per-core=

I am seeking the way out there to avoid the time limit while submitting job and run my complete job.

Suggestions are appreciated.

like image 203
Bhuwan Poudel Avatar asked Dec 13 '25 02:12

Bhuwan Poudel


1 Answers

Time limit is set by admin and that is defined in slurm.conf at /etc/slurm/slurm.conf. There should be partition that defines the limit.

and I am afraid you cannot bypass that limit.

So the only thing that you can do is:

  1. Run for 24 hour and before 24 hour is reached save all the state. (It can be difficult afaik)
  2. Ask admin to increase the timeout
  3. Use more number of nodes,core, threads?

For 1 you need to modify the program and save state which most program should provide if they are supposed to run for long duration?

It seems you are from Nepal and if you happen to run it in Kathmandu University HPC you can ask administration they should help you here.

Regarding your second question:

#SBATCH --nodes=
#SBATCH --ntasks-per-node=
#SBATCH --ntasks=
#SBATCH --ntasks-per-core=

nodes means number of physical node.

For ntask related thing I recommend you to look on this link: What does the --ntasks or -n tasks does in SLURM?

like image 88
Shirshak55 Avatar answered Dec 15 '25 07:12

Shirshak55



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!