Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to automatically run a bash script when my qsub jobs are finished on a server?

I would like to run a script when all of the jobs that I have sent to a server are done.

for example, I send

ssh server "for i in config*; do qsub ./run 1 $i; done"

And I get back a list of the jobs that were started. I would like to automatically start another script on the server to process the output from these jobs once all are completed.

I would appreciate any advice that would help me avoid the following inelegant solution:

If I save each of the 1000 job id's from the above call in a separate file, I could check the contents of each file against the current list of running jobs, i.e. output from a call to:

ssh qstat

I would only need to check every half hour, but I would imagine that there is a better way.

like image 266
David LeBauer Avatar asked Oct 07 '10 21:10

David LeBauer


1 Answers

It depends a bit on what job scheduler you are using and what version, but there's another approach that can be taken too if your results-processing can also be done on the same queue as the job.

One very handy way of managing lots of related job in more recent versions of torque (and with grid engine, and others) is to launch the any individual jobs as a job array (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#-t). This requires mapping the individual runs to numbers somehow, which may or may not be convenient; but if you can do it for your jobs, it does greatly simplify managing the jobs; you can qsub them all in one line, you can qdel or qhold them all at once (while still having the capability to deal with jobs individually).

If you do this, then you could submit an analysis job which had a dependency on the array of jobs which would only run once all of the jobs in the array were complete: (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#dependencyExamples). Submitting the job would look like:

qsub analyze.sh -W depend=afterokarray:427[]

where analyze.sh had the script to do the analysis, and 427 would be the job id of the array of jobs you launched. (The [] means only run after all are completed). The syntax differs for other schedulers (eg, SGE/OGE) but the ideas are the same.

Getting this right can take some doing, and certainly Tristan's approach has the advantage of being simple, and working with any scheduler; but learning to use job arrays in this situation if you'll be doing alot of this may be worth your time.

like image 60
Jonathan Dursi Avatar answered Sep 28 '22 06:09

Jonathan Dursi