Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

median of column with awk

Tags:

bash

sed

awk

median

How can I use AWK to compute the median of a column of numerical data?

I can think of a simple algorithm but I can't seem to program it:

What I have so far is:

sort | awk 'END{print NR}' 

And this gives me the number of elements in the column. I'd like to use this to print a certain row (NR/2). If NR/2 is not an integer, then I round up to the nearest integer and that is the median, otherwise I take the average of (NR/2)+1 and (NR/2)-1.

like image 752
Nick Avatar asked May 29 '11 07:05

Nick


People also ask

How do you find the average of a column in awk?

4 Answers. Show activity on this post. Add the numbers in $2 (second column) in sum (variables are auto-initialized to zero by awk ) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.


2 Answers

With awk you have to store the values in an array and compute the median at the end, assuming we look at the first column:

sort -n file | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'

Sure, for real median computation do the rounding as described in the question:

sort -n file | awk ' { a[i++]=$1; }
    END { x=int((i+1)/2); if (x < (i+1)/2) print (a[x-1]+a[x])/2; else print a[x-1]; }'
like image 99
maxschlepzig Avatar answered Oct 28 '22 00:10

maxschlepzig


This awk program assumes one column of numerically sorted data:

#/usr/bin/env awk
{
    count[NR] = $1;
}
END {
    if (NR % 2) {
        print count[(NR + 1) / 2];
    } else {
        print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2.0;
    }
}

Sample usage:

sort -n data_file | awk -f median.awk
like image 21
Johnsyweb Avatar answered Oct 28 '22 00:10

Johnsyweb