Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting syntax error using awk in parallel processing

I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 columns and I just need to save only sum of value of column 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error

Here is the code and result when I try each two pairs by one one manually

$ bedtools intersect -a p1.tsv -b p2.tsv -c

chr1    1   5   1

chr1    8   12  1

chr1    18  20  1

chr1    21  25  0

bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}

3

Here is the code and result when I am using parallel processing

$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`

awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error

The result should be 44*44 files that contain one single value foe example just 3

like image 672
Nastaran Esfahani Avatar asked Jan 26 '23 03:01

Nastaran Esfahani


1 Answers

@DudiBoy has a good solution. But to me it is annoying that I have to make another file just because I want to call GNU Parallel.

So you can also use functions. This way you do not need to make a new file:

doit() {
  bedtools intersect -a "$1" -b "$2" -c | awk '{sum+=$4} END {print sum}'
}
export -f doit

parallel --results {1}.{2}.intersect doit {1} {2} ::: *.tsv ::: *.tsv
like image 166
Ole Tange Avatar answered Jan 29 '23 08:01

Ole Tange