I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 columns and I just need to save only sum of value of column 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error
Here is the code and result when I try each two pairs by one one manually
$ bedtools intersect -a p1.tsv -b p2.tsv -c
chr1 1 5 1
chr1 8 12 1
chr1 18 20 1
chr1 21 25 0
bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}
3
Here is the code and result when I am using parallel processing
$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1: ^ syntax error
The result should be 44*44 files that contain one single value foe example just 3
@DudiBoy has a good solution. But to me it is annoying that I have to make another file just because I want to call GNU Parallel.
So you can also use functions. This way you do not need to make a new file:
doit() {
bedtools intersect -a "$1" -b "$2" -c | awk '{sum+=$4} END {print sum}'
}
export -f doit
parallel --results {1}.{2}.intersect doit {1} {2} ::: *.tsv ::: *.tsv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With