Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gnu parallel: output each job to a different file

I am trying to process so text files with awk using the parallel command as a shell script, but haven't been able to get it to output each job to a different file

If i try:

seq 10 | parallel awk \''{ if ( $5 > 0.4 ) print $2}'\' file{}.txt > file{}.out

It outputs to the file file{}.out instead of file1.out, file2.out, etc.

The tutorial and man pages also suggest that I could use --files, but it just prints to stdout:

seq 10 | parallel awk \''{ if ( $5 > 0.4 ) print $2}'\' file{}.txt --files file{}.out
like image 728
Scott Ritchie Avatar asked Mar 05 '14 03:03

Scott Ritchie


People also ask

How does GNU parallel work?

GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially, and output names can be easily tied to input file names for simple post-processing. This makes it possible to use output from GNU parallel as input for other programs.

Does Xargs run in parallel?

xargs will run the first two commands in parallel, and then whenever one of them terminates, it will start another one, until the entire job is done. The same idea can be generalized to as many processors as you have handy. It also generalizes to other resources besides processors.

How do I run a shell script in parallel?

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables.

What is parallel command in Ubuntu?

parallel runs the specified command, passing it a single one of the specified arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.


2 Answers

It turns out I needed to quote out the redirect, because it was being processed outside of parallel:

seq 10 | parallel awk \''{...}'\' file{}.txt ">" file{}.out
like image 132
Scott Ritchie Avatar answered Nov 10 '22 15:11

Scott Ritchie


Another way is to introduce the entire parallel command inside double quotes:

seq 10 | parallel " awk command > file{}.out "

Although, sometimes is useful redirect the output to file and also to stdout. You can achieve that using tee. In this case, the command to be used could be:

seq 10 | parallel " awk command | tee file{}.out "

like image 45
Bruce_Warrior Avatar answered Nov 10 '22 16:11

Bruce_Warrior