Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using AWK to find the smallest and largest number in a column?

Tags:

linux

awk

If I have a file with few column and I want to use an AWK command to show the largest and the lowest number in a particular column!

example:

a  212
b  323
c  23
d  45
e  54
f  102

I want my command to show that the lowest number is 23 and another command to say the highest number is 323

I have no idea why the answers are not working! I put a more realistic example of my file( maybe I should mention that is tab determined)

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  rmdup_wl_25248.bam
Chr10   247     .       T       C       7.8     .       DP=37;AF1=0.5;CI95=0.5,0.5;DP4=7,1,19,0;MQ=15;FQ=6.38;PV4=0.3,1,0.038,1 GT:PL:GQ        0/1:37,0,34:36
Chr10   447     .       A       C       75      .       DP=30;AF1=1;CI95=1,1;DP4=0,0,22,5;MQ=14;FQ=-108 GT:PL:GQ        1/1:108,81,0:99
Chr10   449     .       G       C       35.2    .       DP=33;AF1=1;CI95=0.5,1;DP4=3,2,20,3;MQ=14;FQ=-44;PV4=0.21,1.7e-06,1,0.34        GT:PL:GQ        1/1:68,17,0:31
Chr10   517     .       G       A       222     .       DP=197;AF1=1;CI95=1,1;DP4=0,0,128,62;MQ=24;FQ=-282      GT:PL:GQ        1/1:255,255,0:99
Chr10   761     .       G       A       27      .       DP=185;AF1=0.5;CI95=0.5,0.5;DP4=24,71,8,54;MQ=20;FQ=30;PV4=0.07,8.4e-50,1,1     GT:PL:GQ        0/1:57,0,149:60
Chr10   1829    .       A       G       3.01    .       DP=74;AF1=0.4998;CI95=0.5,0.5;DP4=18,0,54,0;MQ=19;FQ=4.68;PV4=1,9.1e-12,0.003,1 GT:PL:GQ        0/1:30,0,45:28

I should say that I have already add excluding line that start with # so this is the command that I use:

awk '$1 !~/#/' | awk -F'\t' 'BEGIN{first=1;} {if (first) { max = min = $6; first = 0; next;} if (max < $6) max=$6; if (min > $6) min=$6; } END { print min, max }' wl_25210_filtered.vcf

awk '$1 !~/#/' | awk -F'\t' 'BEGIN{getline;min=max=$6} NF{ max=(max>$6)?max:$6 min=(min>$6)?$6:min} END{print min,max}' wl_25210_filtered.vcf

and

awk '$1 !~/#/' | awk -F'\t' '
NR==2{min=max=$6;next}
NR>2 && NF{
    max=(max>$6)?max:$6
    min=(min>$6)?$6:min
}
END{print min,max}' wl_25210_filtered.vcf
like image 501
mahmood Avatar asked Dec 22 '11 13:12

mahmood


People also ask

How to find largest and smallest number in an array in Java?

This Java program shows how to find the largest and the smallest number from within an array. Here in this program, a Java class name FindLargestSmallestNumber is declared which is having the main () method. Inside the main (), the integer type array is declared and initialized.

How to find the highest number in a column in Excel?

The formula for finding the highest number in a column is =Max (). Enter the formula =Max () in an empty cell. Select the column that you want to find the highest value using the mouse.

How to find the max value of a column in Excel?

awk 'function max (val1, val2) { if (val1 > val2) return val1 else return val2 } BEGIN { largest = 0 } NR==FNR { largest = max (largest,$5 + 0); next } $5==largest { print $2 }' infile infile Find the max value in column 5, then print all values in column 2 that have that max value.


2 Answers

If your file contains empty lines, neither of the posted solutions will work. For correct handling of empty lines try this:

$ cat f.awk
BEGIN{getline;min=max=$6}
NF{
    max=(max>$6)?max:$6
    min=(min>$6)?$6:min
}
END{print min,max} 

Then run this command:

sed "/^#/d" my_file | awk -f f.awk

At first it catches the first line of the file to set min and max. Than for each non-empty line it use the ternary operator check, if a new min or max was found. At the end the result ist printed.

HTH Chris

like image 92
Chris Avatar answered Sep 21 '22 05:09

Chris


You can create two user defined functions and use them as per your need. This will offer more generic solution.

[jaypal:~/Temp] cat file
a  212
b  323
c  23
d  45
e  54
f  102
[jaypal:~/Temp] awk '
function max(x){i=0;for(val in x){if(i<=x[val]){i=x[val];}}return i;}
function min(x){i=max(x);for(val in x){if(i>x[val]){i=x[val];}}return i;}
{a[$2]=$2;next}
END{minimum=min(a);maximum=max(a);print "Maximum = "maximum " and Minimum = "minimum}' file
Maximum = 323 and Minimum = 23

In the above solution, there are 2 user defined functions - max and min. We store the column 2 in an array. You can store each of your columns like this. In the END statement you can invoke the function and store the value in a variable and print it.

Hope this helps!

Update:

Executed the following as per the latest example -

[jaypal:~/Temp] awk '
function max(x){i=0;for(val in x){if(i<=x[val]){i=x[val];}}return i;}
function min(x){i=max(x);for(val in x){if(i>x[val]){i=x[val];}}return i;}
/^#/{next}
{a[$6]=$6;next}
END{minimum=min(a);maximum=max(a);print "Maximum = "maximum " and Minimum = "minimum}' sample
Maximum = 222 and Minimum = 3.01
like image 34
jaypal singh Avatar answered Sep 21 '22 05:09

jaypal singh