Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Greater than float with awk

Tags:

bash

macos

awk

I found some questions about this, but none of them really answered to my question.

I have a tabulated file like this:

2   10610   0   0   0   0.0105292
2   10649   0   0   0   0.041959
2   10682   0   0   0   0.0449746
2   10705   0   0   0   0.0441639
2   10797   2   0   0   0.0342728
2   10955   0   0   0   0.0136986
2   10957   0   0   0   0.0135135
2   11124   0   0   0   0.0583367
2   11336   1   0   0   0.0219502

and I used this command:

awk '{if ($6 > 0.4) print $6}' myfile

And here is the output:

0.0105292
0.041959
0.0449746
0.0441639
0.0342728
0.0136986
0.0135135
0.0583367
0.0219502

It's returning all the value for the 6th column. Here i should get no results since the condition is not respected. So I guess awk is not considering $6 as a float.

I tried other syntax but I still have the same problem.

I also tried the command on the first column and there it's working...

ps: I'm on MacOSX

Edit: Though it's working when I use awk '{print $6}'

like image 481
D Prat Avatar asked Jan 02 '23 12:01

D Prat


1 Answers

It's your locale setting (see https://www.gnu.org/software/gawk/manual/gawk.html#Locales and specifically https://www.gnu.org/software/gawk/manual/gawk.html#Locale-influences-conversions), explicitly setting LC_ALL=C is one way to solve the problem:

LC_ALL=C awk '{if ($6 > 0.4) print $6}' myfile

What's happening is that you're trying to use a decimal point of . but your locale (typical in most European countries and many others) uses , instead. So when your input contains:

0.0105292

awk does not recognize it as looking like a number in your locale, so instead it gets treated as a string. If your input was instead:

0,0105292

THEN awk would recognize it as a number (so this is the other way to solve your problem - use commas as the decimal point in your input).

So to awk your code:

$6 > 0.4

is a string "0.0105292" being compared to a number 0.4 (per POSIX the . is always the decimal point when used in the code) and per this comparison table from the gawk manual:

        +----------------------------------------------
        |       STRING          NUMERIC         STRNUM
--------+----------------------------------------------
        |
STRING  |       string          string          string
        |
NUMERIC |       string          numeric         numeric
        |
STRNUM  |       string          numeric         numeric
--------+----------------------------------------------

we see that the type of comparison performed when a string is compared to a number (or anything else) is a string comparison.

So in your original code the string "0.0105292" is being string-compared with the number 0.4 and awk is apparently deciding that the former is greater than the latter (idk why, maybe some other locale effect).

like image 92
Ed Morton Avatar answered Jan 05 '23 15:01

Ed Morton