Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command

echo 'hello world' > hello.txt

(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:

qsub echo 'hello world' > hello.txt

the resulting content of hello.txt would look like

Your job 123454321 ("echo") has been submitted

Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.

I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:

for i, (infile1, infile2, outfile) in enumerate(files):
    command = ("bowtie -S %s %s | " +
               "samtools view -bS - > %s\n") % (infile1, infile2, outfile)

    script = "job" + str(counter) + ".sh"
    open(script, "w").write(command)
    os.system("chmod 755 %s" % script)
    os.system("qsub -cwd ./%s" % script)

This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.

Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?

like image 354
David Robinson Avatar asked Aug 19 '13 20:08

David Robinson


People also ask

Can you pipe commands in cmd?

The | command is called a pipe. It is used to pipe, or transfer, the standard output from the command on its left into the standard input of the command on its right. # First, echo "Hello World" will send Hello World to the standard output.

How does pipe work in CMD?

Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes.

What is QSUB in Linux?

The qsub command is used to submit jobs to the queue. job, as previously mentioned, is a program or task that may be assigned to run on a cluster system. qsub command is itself simple, however, it to actually run your desired program may be a bit tricky. is because qsub, when used as designed, will only run scripts.


2 Answers

You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:

 qsub bash -c "cmd <options> | cmd2 <options>"

As @spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:

echo "cmd <options> | cmd2 <options>" | qsub

as well.

like image 146
David Robinson Avatar answered Sep 28 '22 00:09

David Robinson


Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).

  1. Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
  2. echo each call to qsub and then pipe that to shell.

Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:

bowtie -q query -1 param1 -2 param2 ... | samtools ...

To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.

cat params-query.txt | xargs -i echo qsub -b y -o output_log  -e error_log -N job_name \"{}\" | sh 

If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:

bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out

This page has interesting qsub tricks you might like

like image 30
bioSlayer Avatar answered Sep 28 '22 02:09

bioSlayer