If I'm not mistaken, awk parses a number depending on the OS language (eg,echo "1,2" | awk '{printf("%f\n",$1)}'
would be interpreted as 1 in an english system and as 1.2 in a system where a comma separates the integer from the decimal part).
I don't know if the C printf does this too, so I added the C tag.
I would like to modify the previous command so that it returns the same value (1.2) regardless of the system being used.
Welcome to the ugliness of locale. To fix your problem, first set the locale to the C one.
export LC_NUMERIC=C
echo "1,2" | awk '...your code...'
To turn off other locale-dependent tomfoolery, you can
export LC_ALL=C
If you're using gawk
, you can use the --use-lc-numeric
option.
$ LC_NUMERIC=de_DE.UTF-8 awk 'BEGIN {printf("%f\n", "1,2")}'
1.000000
$ LC_NUMERIC=de_DE.UTF-8 awk --use-lc-numeric 'BEGIN {printf("%f\n", "1,2")}'
1,200000
From the GAWK manual
The POSIX standard says that awk always uses the period as the decimal point when reading the awk program source code, and for command-line variable assignments (see Other Arguments). However, when interpreting input data, for print and printf output, and for number to string conversion, the local decimal point character is used. Here are some examples indicating the difference in behavior, on a GNU/Linux system:
$ gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3.14159 $ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3,14159 $ echo 4,321 | gawk '{ print $1 + 1 }' -| 5 $ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }' -| 5,321
The ‘en_DK’ locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal "C" locale, gawk treats ‘4,321’ as ‘4’, while in the Danish locale, it's treated as the full number, 4.321.
Some earlier versions of gawk fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, since their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the
--use-lc-numeric
option (see Options) to force gawk to use the locale's decimal point character. (gawk also uses the locale's decimal point character when in POSIX mode, either via--posix
, or thePOSIXLY_CORRECT
environment variable.)
I get similar behavior from /usr/bin/printf
$ LC_NUMERIC=de_DE.UTF-8 /usr/bin/printf "%f\n" "1,2"
/usr/bin/printf: 1,2: value not completely converted
1,000000
$ LC_NUMERIC=de_DE.UTF-8 /usr/bin/printf "%f\n" "1.2"
1,200000
But without the ability to override it.
If your intent is to do the opposite, that is to take "European" input and output "US" numbers, you're going to need to use something more robust. Possible Python or Perl with their locale modules.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With