Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalize column data with maximum value of that column

Tags:

awk

gawk

I have a data file with two columns. I want to find out the maximum data value from the second column and divide each entries of second column witht he maximum value. (So I will get all the entries in second column <= 1.00).

I tried with this command below:

awk 'BEGIN {max = 0} {if ($2>max) max=$2} {print  ($2/max)}' angleOut.dat

but I get error message as below.

awk: (FILENAME=angleOut.dat FNR=1) fatal: division by zero attempted

note: There are some data in the second column which is zero value. But when the zero value divide with max value, I should get zero, but I get error as above.

Could I get any help for this?

Many thanks in advance.

like image 357
Vijay Avatar asked Mar 17 '23 19:03

Vijay


2 Answers

Let's take this as the sample input file:

$ cat >file
1 5
2 2
3 7
4 6

This awk script will normalize the second column:

$ awk 'FNR==NR{max=($2+0>max)?$2:max;next} {print $1,$2/max}' file file
1 0.714286
2 0.285714
3 1
4 0.857143

This script reads through the input file twice. The first time, it finds the maximum. The second time is prints the lines with the second column normalized.

The Ternary Statement

Consider:

max=($2+0>max)?$2:max

This is a compact form of an if-then-else statement. The "if" part is $2+0>max. If this evaluates to true, the value following the ? is assigned to max. If it is false, then the value following the : is assigned to max.

The more explicit form of an if statement works well too.

Also, note that incantation $2+0. In awk variables can be strings or numbers according to context. In string context, > compares lexicographic ordering. We want a numeric comparison. By adding zero to $2, we are removing all doubt and forcing awk to treat $2 as a number.

like image 162
John1024 Avatar answered Apr 25 '23 02:04

John1024


You cannot determine max before seeing the whole file so you need two passes. This one uses two awk executions to get the normalized output:

awk -vmax=$(awk 'max < $2 { max = $2 } END { print max }' angleOut.dat) \
    '{print $2 / max}' angleOut.dat
like image 34
perreal Avatar answered Apr 25 '23 02:04

perreal