Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk: find minimum and maximum in column

Tags:

bash

awk

I'm using awk to deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space. I want to find the minimum and maximum of the first column.

The data file looks like this:

9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496

The commands I used are as follows.

min=`awk 'BEGIN{a=1000}{if ($1<a) a=$1 fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a=   0}{if ($1>a) a=$1 fi} END{print a}' mydata.dat`

However, the output is min=10 and max=9.

(The similar commands can return me the right minimum and maximum of the second column.)

Could someone tell me where I was wrong? Thank you!

like image 869
Wang Zong'an Avatar asked Apr 21 '15 22:04

Wang Zong'an


People also ask

How do you find the maximum value in a column in Unix?

The -v max=0 sets the variable max to 0 , then, for each line, the first field is compared to the current value of max . If it is greater, max is set to the value of the 1st field and want is set to the current line.

How do you use NR in awk?

NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file. NF: NF command keeps a count of the number of fields within the current input record.

What does NF in awk mean?

NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. No matter how many fields there are, the last field in a record can be represented by $NF .


2 Answers

Awk guesses the type.

String "10" is less than string "4" because character "1" comes before "4". Force a type conversion, using addition of zero:

min=`awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}' mydata.dat`
max=`awk 'BEGIN{a=   0}{if ($1>0+a) a=$1} END{print a}' mydata.dat`
like image 115
Klaus Zeuge Avatar answered Oct 18 '22 01:10

Klaus Zeuge


a non-awk answer:

cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
  > >(echo "max=$(tail -1)")

That tee command is perhaps a bit much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.

The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:

cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'

or

cut -d" " -f1 file | sort -n | { 
    read line
    echo "min=$line"
    while read line; do max=$line; done
    echo "max=$max"
}
like image 12
glenn jackman Avatar answered Oct 18 '22 01:10

glenn jackman