Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash script that allows qsub in TORQUE to wait until the job get finished, pretty much like -sync y in SGE system

Tags:

bash

qsub

torque

I'm using a cluster with Torque/Maui system. I have a bash script that submit one job, using qsub command, and afterwards does several things, like move files, write ASCII files, and check the output from the job I submitted. Concerning this output, basically, If it contains the number 1 the job need to be submitted again. If different of 1, the bash script does something else.

The problem is that the qsub run in background, and all the bash is evaluated at once. I'd like to force qsub to behaves pretty much like awk, cat, sort, etc ... when the script just goes further after those commands finish - if not put in background.

So, I need to the bash stops at the first qsub, and continue running just after qsub get finished, it means, when the job finish. Is there any way of doing this ? It will be something similar to:

   -sync y    # in the SGE system, for instance.

what I have:

#!/bin/bash
.
.
some commands
.
.
qsub my_application  # need to wait until my_application get done
.
.
more commands
.
.
my_application_output=(`cat my_application_output.txt`)

case "$my_application_output" in
["1"])
     qsub my_application
     ;;
["0"])
     some commands
     ;;
["100"])
     some commands
     ;;
*)
     some commands
     exit 1

esac

.
.

some remarks


  • It is not convenient to use: qsub -I -x, once I'd like to keep the output on the output file; and do not want to lock out the node by starting a interactive mode (-I)
  • I guess it is not a simple job dependency problem, once the re-submission 1) could occurs, 2) could not, and, most important, if occurs(1), it can be several times.

Thanks for all

like image 856
Quim Avatar asked Sep 30 '22 17:09

Quim


2 Answers

Quim Oct 3 at 4:05: "it is not a simple job dependency problem"

You must create a simple job dependency problem--simple enough for your script to handle, anyway. And in fact your script gates on my_application_output.txt, so why not just sleep on that? something like

#!/usr/bin/env bash
# I prefer to have constants at the top
my_application_output_fp='/path/to/my_application_output.txt' 
#
#
# some commands
#
#
qsub my_application
#
#
# more commands
#
#

# sleep until my_application outputs
while [[ ! -r "${my_application_output_fp}" ]] ; do
    sleep 1
done

my_application_output="$(cat ${my_application_output_fp})"
# process it

If my_application_output.txt gets written too long before the end of the end of my_application, change my_application to write a flag file just before it exits, and gate on that:

#!/usr/bin/env bash
my_application_flag_fp='/path/to/my_application_flag.txt' 
my_application_output_fp='/path/to/my_application_output.txt' 
#
#
# some commands
#
#
qsub my_application
#
#
# more commands
#
#

# sleep until my_application writes flag
while [[ ! -r "${my_application_flag_fp}" ]] ; do
    sleep 1
done

if [[ ! -r "${my_application_output_fp}" ]] ; then
    # handle error
fi
# else
my_application_output="$(cat ${my_application_output_fp})"
# process it
like image 148
TomRoche Avatar answered Oct 11 '22 16:10

TomRoche


The qsub command should return the id of the job to be executed, something similar to,

$qsub myapplication  
12345.hpc.host

You can then use it to check the status of your job with the qstat command,

$qstat 12345.hpc.host
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
12345.hpc.host            STDIN            user            00:00:00 Q queue

Once the job is completed, it no longer is displayed by the qstat command. In that case,

$qstat 12345.hpc.host
qstat: Unknown Job Id Error 12345.hpc.host

In fact, the output is even not necessary. One can discard it to /dev/null and simply check the exit status of the qstat command,

if qstat 12345.hpc.host &>/dev/null; then
    echo "Job is running"
else
    echo "Job is not running"
fi

Or even shorter,

qstat 12345.hpc.host &> /dev/null && echo "Job is running" || echo "Job is NOT running"

So what you want to achieve should now be rather simple. Launch the job, store its id in a variable and sleep until the qstat command fails,

JOBID=$(qsub myapplication)
while qstat $JOBID &> /dev/null; do
    sleep 5;
done;

You can store the while loop in a bash function to use in all your processing scripts. You can also expand on this idea to launch and wait for a list of jobs to run.

like image 27
Pedro Inácio Avatar answered Oct 11 '22 16:10

Pedro Inácio