Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge series and recognize end of it - AWK

Tags:

merge

awk

could you help me please merge rows by coordinates in column $2. There are series of coordinates growing by one. I wan to output f.e. : 1st row merge to 4th row 9079811-9079814 and after that there is no series so merge it to another row etc.. for 3rd column in input I would like to count average.

I wrote some script, but this script merge all rows from first coordinate to last coordinate. no condition to series.

 awk -F'\t' -v OFS="\t" '{print $2,$4,$3,$1}' input | awk '!x[$2]{x[$2]=$1}y[$2]<$1{y[$2]=$1;}x[$2]>$1{x[$2]=$1} {sum+=$3} END{for(i in y)print $1,x[i],y[i],sum/NR,i}' | sort -V -k1,1 > output

INPUT:

chr12   9079811 29  A2M
chr12   9079812 29  A2M
chr12   9079813 29  A2M
chr12   9079814 28  A2M
chr12   9091202 5   A2M
chr12   9091203 5   A2M
chr12   9091204 5   A2M
chr12   9091390 15  A2M
chr12   9091391 15  A2M
chr12   9091392 13  A2M

OUTPUT:

chr12  9079811  9079814 28.75 A2M
chr12  9091202  9091204 5     A2M
chr12  9091390  9091392 14.3  A2M
like image 546
Vonton Avatar asked Nov 27 '25 01:11

Vonton


1 Answers

Awk solution:

awk 's{ 
         if ($2-prev == 1) { sum += $3; c++; prev=$2 }
         else { print $1, s, prev, sum/c, $4; s=sum=c=0 }
     }
     !s{ s=prev=$2; sum=$3; c++ }
     END{ print $1, s, prev, sum/c, $4 }' file
  • s - variable pointing to actively processed series; contains the starting element of a series(for ex. 9079811)
  • prev - holds each previous item of the processed series

The output:

chr12 9079811 9079814 28.75 A2M
chr12 9091202 9091204 5 A2M
chr12 9091390 9091392 14.3333 A2M
like image 69
RomanPerekhrest Avatar answered Nov 29 '25 22:11

RomanPerekhrest



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!