Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk command to select values from intervals defined by pairs of columns

Tags:

awk

I am trying to design an awk command to select lines with a value column 2 that is in the range of values defined by pairing specific columns of a line together. It has application in calling single nucleotide polymorphisms that are not within 50 nucleotides of exon boundries. The file looks like this:

ID  X   start   end start   end start   end start   end  
Fal1825_c6  802 2   62  62  239 239 362 362 934  
Fal1821_c2  152 1   19  22  159 159 263 264 398  
Fal18279_c7 41  1   177 177 598                 
Fal18376_c3 367 1   251 251 421                 
Fal18748_c2 601 1   152 152 489 489 499 499 677  
Fal18748_c2 500 1   152 152 489 489 499 499 677  
Fal18792_c3 750 1   234 234 459 459 762 762 83  
Fal19487_c2 89  1   177 177 270 270 409 411 459  

I want to only print lines where the value of second column falls in the range (”start” + 50) and (“end” - 50), for any "start" and "end" pairing on that line (pairings only made from "start" and "end" columns next to each other), i.e. between ($3+50 and $4-50) or ($5+50 and $6-50) or ($7+50 and $8-50), and so on, considering all the pairs of start-end columns for the component.

The output would look like:

ID  X   start   end start   end start   end start   end  
Fal1825_c6  802 2   62  62  239 239 362 362 934  
Fal18376_c3 367 1   251 251 421             
Fal18748_c2 601 1   152 152 489 489 499 499 677  
Fal19487_c2 89  1   177 177 270 270 409 411 459  

My attempted command looked like this

awk '{a=3; b=4; while ($a > 0) do {if ($2 > ($a + 50) && $2 < ($b + 50)){print $0} else {a+2, b+2} }'

Thank you

like image 623
Cris Avatar asked Dec 12 '25 15:12

Cris


1 Answers

Try:

awk '{
for (i = 3; i <= NF; i += 2)
  if ($2 > $i+50 && $2 < $(i+1)-50) { print; next } 
}' FILE
like image 172
yazu Avatar answered Dec 16 '25 07:12

yazu