I'm facing a rather strange problem withawk
where I want to calculate the average of a column. This is the test input form my file:
1
2
0.4
0.250
0.225
0.221
0.220
0.218
And this is the script I'm trying to run:
awk '{sum += $1} END {print sum; print sum / NR}' ~/Desktop/bar.txt
What I expect as output is:
<calculated sum>
<calculated average>
But this is what I get invariably:
3
0,375
I've checked the formatting and characters of the input file etc. but I can't getawk
to sum up those pesky floats.
Any ideas?
I'm running awk
version 20070501 in bash 3.2.48 on OS X 10.8.5.
As @sudo_O correctly deduced, the problem is my locale. Replacing the .
with a ,
in the file yields the correct results. That's obviously not the solution I'm looking for though so I need to do something with my locale which is currently set to:
$ locale
LANG="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_CTYPE="de_CH.UTF-8"
LC_MESSAGES="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_ALL=
I'd like to keep numeric, monetary and date locales I think. Which locale do I need to change (and how), to make awk
work?
The problem is not awk
here. Explicitly use floats and see what you get:
$ awk '{sum+=sprintf("%f",$1)}END{printf "%.6f\n%.6f\n",sum,sum/NR}' file
4.534000
0.566750
It looks like it's probably your locale as your output uses a ,
as the decimal separator so post the output of the locale
command.
So using your LC_NUMERIC
I can reproduce your results:
$ LC_NUMERIC="de_CH.UTF-8" awk '{sum += $1} END {print sum; print sum / NR}' file
3
0,375
The fix is to set your LC_NUMERIC
or LC_ALL
to C
or anything else that use .
as the decimal separator:
$ LC_NUMERIC="C" awk '{sum += $1} END {print sum; print sum / NR}' file
4.534
0.56675
See man locale
for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With