In C and Java, there are defined constants representing the maximum and minimum values an integer can hold.
Are there such constants in awk
? If so, what are their names?
The awk manual indicates that awk can support arbitrary precision integer arithmetic with -M
, but I'd like to know about the bounds on integers when we do not specify -M
.
INT_MAX is a macro that specifies that an integer variable cannot store any value beyond this limit. INT_MIN specifies that an integer variable cannot store any value below this limit. Values of INT_MAX and INT_MIN may vary from compiler to compiler.
INT_MAX is a macro which represents the maximum integer value. Similarly, INT_MIN represents the minimum integer value. These macros are defined in the header file <limits.
(Arithmetic) Integer Overflows An integer overflow occurs when you attempt to store inside an integer variable a value that is larger than the maximum value the variable can hold. The C standard defines this situation as undefined behavior (meaning that anything might happen).
Not really something I've considered before so I may be barking up the wrong tree completely but since awk uses double-precision floating-point numbers by default, maybe what you're looking for is based on the value of PREC
in gawk (see https://www.gnu.org/software/gawk/manual/gawk.html#Setting-precision). Look:
$ awk 'BEGIN{print PREC}'
53
$ awk 'BEGIN{print (2^52)}'
4503599627370496
$ awk 'BEGIN{print (2^52)+1}'
4503599627370497
$ awk 'BEGIN{print (2^PREC)}'
9007199254740992
$ awk 'BEGIN{print (2^PREC)+1}'
9007199254740992
Notice how integer arithmetic fails when you try to go beyond 2^PREC
? So maybe 2^PREC
is a reasonable value to use for a MAX_INT equivalent and you could derive a MIN_INT similarly. Think about it, try it, see if it makes sense for your needs....
High integers in current (g
)awk
are oddly broken without -M
. It is easy to spot that BEGIN {print 2^1024}
yields inf
, whereas BEGIN {print 2^1023}
works. One would therefore assume that the maximum integer in this particular implementation is 21024 − 1. Yet this is not the case.
A simple experiment, based on the fact that 21024 − 1 = 21023 + 21022 + ⋯ + 21 + 20:
BEGIN {for (i = 1023; i >= 0; --i) sum += 2^i; print sum}
This^^^ yields infinity, surprisingly enough. So, at which point do we need to stop adding the powers of 2 in order to obtain a valid result? On my systems the limit appears to be 971 — try 970 and it sums to infinity.
BEGIN {for (i = 1023; i >= 971; --i) sum += 2^i; print sum}
This^^^ prints 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
.
The value has a surprising property in awk
: Whatever you add to it, up to a certain number, does not change it any more. (Try to print (e.g.) sum + 3
.) Incrementing it (although it appears to remain unchanged, based on the print
output) beyond a certain threshold yields infinity, eventually. This is definitely a bug.
As for the original sum above (21023 + ⋯ + 2971), it is still correct in awk
. Things start to fall apart once you try to increase that sum further. For example (and surprisingly), this still yields the same result as above:
BEGIN {for (i = 1023; i >= 971; --i) sum += 2^i
for (i = 969; i >= 0; --i) sum += 2^i
print sum}
Checking both sums with Python is easy:
sum = 0
for i in range(971, 1024):
sum += 2**i
print(sum) # awk gets this right
for i in range(0, 970):
sum += 2**i
print(sum) # awk without -M gets this wrong
All in all, I think I will be setting -M
in awk
all the time from now on!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With