Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python wait Slurm job?

Tags:

python

slurm

I have a python script that should generate a bunch of inputs for an external program to be called. The calls to the external program will be through slurm.

What I want is for my script to wait until all the generated calls to the external programs are finished (not the slurm command, the actual execution of the external program), to then parse the outputs generated by the external program, do some stuff with the data.

I tried subprocess call, but it only waits the slurm submission command. Any suggestion?

like image 369
Anon Avatar asked Aug 14 '18 09:08

Anon


2 Answers

You can run your sbatch commands asynchronously in subprocesses as you tried before, but use the -W or --wait command line options for sbatch. This will cause the subprocess to not return until the job has terminated. You can then block the execution of your main program until all of the subprocesses complete. As a bonus, this will also allow you to handle unexpected return values from your external program. See sbatch documentation for more information

like image 188
John Avatar answered Sep 20 '22 02:09

John


solution 1

I would suggest breaking your pipeline up in smaller steps, which can then be automated in a bash script etc. First you generate all the commands that needs to be run through slurm. If you submit them as a slurm job array (see e.g. here), you can then simultaneous submit the script that parses the output of all these commands. Using slurm dependencies, you can make this job start only after the job array has finished.

solution 2

You could do a while loop in your python script and check the status of the jobs:

import time
t = time.time()
while True:
    # Break if this takes more than some_limit
    if time.time() - t > some_limit:
        break
    # Check if the jobs are done. This could be done by
    # grep'ing squeue for your username and some tags
    # that you name your jobs
    check_for_completion()
    # Sleep for a while depending on the estimated completion time of the jobs
    time.sleep(some_time)

solution 3

Reserve N nodes on slurm and run your script there. This avoids cluttering the front end. I suggest gnu parallel to distribute your jobs on the node.

like image 30
user2653663 Avatar answered Sep 17 '22 02:09

user2653663