There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|
) or a redirect (>
). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt
would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram
), that message is all that would be passed to myprogram
, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh
. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh
scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt
command to qsub without having to create another file containing the command?
The | command is called a pipe. It is used to pipe, or transfer, the standard output from the command on its left into the standard input of the command on its right. # First, echo "Hello World" will send Hello World to the standard output.
Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes.
The qsub command is used to submit jobs to the queue. job, as previously mentioned, is a program or task that may be assigned to run on a cluster system. qsub command is itself simple, however, it to actually run your desired program may be a bit tricky. is because qsub, when used as designed, will only run scripts.
You can do this by turning it into a bash -c
command, which lets you put the |
in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As @spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With