Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does AWK refuse to sum up floats

I'm facing a rather strange problem withawk where I want to calculate the average of a column. This is the test input form my file:

1
2
0.4
0.250
0.225
0.221
0.220
0.218

And this is the script I'm trying to run:

awk '{sum += $1} END {print sum; print sum / NR}' ~/Desktop/bar.txt

What I expect as output is:

<calculated sum>
<calculated average>

But this is what I get invariably:

3
0,375

I've checked the formatting and characters of the input file etc. but I can't getawk to sum up those pesky floats.

Any ideas?

I'm running awk version 20070501 in bash 3.2.48 on OS X 10.8.5.

Update

As @sudo_O correctly deduced, the problem is my locale. Replacing the . with a , in the file yields the correct results. That's obviously not the solution I'm looking for though so I need to do something with my locale which is currently set to:

$ locale
LANG="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_CTYPE="de_CH.UTF-8"
LC_MESSAGES="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_ALL=

I'd like to keep numeric, monetary and date locales I think. Which locale do I need to change (and how), to make awk work?

like image 796
Max Leske Avatar asked Sep 22 '13 17:09

Max Leske


1 Answers

The problem is not awk here. Explicitly use floats and see what you get:

$ awk '{sum+=sprintf("%f",$1)}END{printf "%.6f\n%.6f\n",sum,sum/NR}' file
4.534000
0.566750

It looks like it's probably your locale as your output uses a , as the decimal separator so post the output of the locale command.


So using your LC_NUMERIC I can reproduce your results:

$ LC_NUMERIC="de_CH.UTF-8" awk '{sum += $1} END {print sum; print sum / NR}' file
3
0,375

The fix is to set your LC_NUMERIC or LC_ALL to C or anything else that use . as the decimal separator:

$ LC_NUMERIC="C" awk '{sum += $1} END {print sum; print sum / NR}' file
4.534
0.56675

See man locale for more information.

like image 174
Chris Seymour Avatar answered Nov 20 '22 14:11

Chris Seymour