awk is capable of parsing fields as hexadecimal numbers:
$ echo "0x14" | awk '{print $1+1}'
21 <-- correct, since 0x14 == 20
However, it does not seem to handle actions with hexadecimal literals:
$ echo "0x14" | awk '$1+1<=21 {print $1+1}' | wc -l
1 <-- correct
$ echo "0x14" | awk '$1+1<=0x15 {print $1+1}' | wc -l
0 <-- incorrect. awk is not properly handling the 0x15 here
Is there a workaround?
You're dealing with two similar but distinct issues here, non-decimal data in awk
input, and non-decimal literals in your awk
program.
See the POSIX-1.2004 awk specification, Lexical Conventions:
8. The token NUMBER shall represent a numeric constant. Its form and numeric value [...]
with the following exceptions:
a. An integer constant cannot begin with 0x or include the hexadecimal digits 'a', [...]
So awk (presumably you're using nawk
or mawk
) behaves "correctly". gawk
(since version 3.1) supports non-decimal (octal and hex) literal numbers by default, though using the --posix
switch turns that off, as expected.
The normal workaround in cases like this is to use the defined numeric string behaviour, where a numeric string is to effectively be parsed as the C standard atof()
or strtod()
function, that supports 0x
-prefixed numbers:
$ echo "0x14" | nawk '$1+1<=0x15 {print $1+1}'
<no output>
$ echo "0x14" | nawk '$1+1<=("0x15"+0) {print $1+1}'
21
The problem here is that that isn't quite correct, as POSIX-1.2004 also states:
A string value shall be considered a numeric string if it comes from one of the following:
1. Field variables
...
and after all the following conversions have been applied, the resulting string would
lexically be recognized as a NUMBER token as described by the lexical conventions in Grammar
UPDATE: gawk
aims for "2008 POSIX.1003.1", note however since the 2008 edition (see the IEEE Std 1003.1 2013 edition awk
here) allows strtod()
and implementation-dependent behaviour that does not require the number to conform to the lexical conventions. This should (implicitly) support INF
and NAN
too. The text in Lexical Conventions is similarly amended to optionally allow hexadecimal constants with 0x
prefixes.
This won't behave (given the lexical constraint on numbers) quite as hoped in gawk
:
$ echo "0x14" | gawk '$1+1<=0x15 {print $1+1}'
1
(note the "wrong" numeric answer, which would have been hidden by |wc -l
)
unless you use --non-decimal-data
too:
$ echo "0x14" | gawk --non-decimal-data '$1+1<=0x15 {print $1+1}'
21
See also:
This accepted answer to this SE question has a portability workaround.
The options for having the two types of support for non-decimal numbers are:
gawk
, without --posix
and with --non-numeric-data
If you search for "awk dec2hex" you can find many instances of the latter, a passable one is here: http://www.tek-tips.com/viewthread.cfm?qid=1352504 . If you want something like gawk's strtonum()
, you can get a portable awk-only version here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With