To my great surprise, I found that rounding a NaN value in Haskell returns a gigantic negative number:
round (0/0)
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
The same thing happens with floor and ceiling.
What is happening here? Is this behavior intended? Of course, I understand that anyone who doesn't want this behavior can always write another function that checks isNaN - but are there existing alternative standard library functions that handle NaN more sanely (for some definition of "more sanely")?
TL;DR: NaN
have an arbitrary representation between 2 ^ 1024
and 2 ^ 1025
(bounds not included), and - 1.5 * 2 ^ 1024
(which is one possible) NaN
happens to be the one you hit.
What is happening here?
You're entering the region of undefined behaviour. Or at least that is what you would call it in some other languages. The report defines round
as follows:
6.4.6 Coercions and Component Extraction
The
ceiling
,floor
,truncate
, andround
functions each take a real fractional argument and return an integral result. …round x
returns the nearest integer tox
, the even integer ifx
is equidistant between two integers.
In our case x
does not represent a number to begin with. According to 6.4.6, y = round x
should fulfil that any other z
from round
's codomain has an equal or greater distance:
y = round x ⇒ ∀z : dist(z,x) >= dist(y,x)
However, the distance (aka the subtraction) of numbers is defined only for, well, numbers. If we used
dist n d = fromIntegral n - d
we get in trouble soon: any operation that includes NaN
will return NaN
again, and comparisons on NaN
fail, so our property above does not hold for any z
if x
was a NaN
to begin with. If we check for NaN
, we can return any value, but then our property holds for all pairs:
dist n d = if isNaN d then constant else fromIntegral n - d
So we're completely arbitrary in what round x
shall return if x
was not a number.
"OK", I hear you say, "that's all fine and dandy, but why do I get that number?" That's a good question.
Is this behavior intended?
Somewhat. It isn't really intended, but to be expected. First of all, we have to know how Double
works.
A Double
in Haskell is usually a IEEE 754 compliant double precision floating point number, that is a number that has 64 bits and is represented with
x = s * m * (b ^ e)
where s
is a single bit, m
is the mantissa (52 bits) and e
is the exponent (11 bits, floatRange
). b
is the base, and its usually 2
(you can check with floadRadix
). Since the value of m
is normalized, every well-formed Double
has a unique representation.
Except NaN
. NaN
is represented as the emax+1, as well as a non-zero mantissa. So if the bitfield
SEEEEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
represents a Double
, what's a valid way to represent NaN
?
?111111111111000000000000000000000000000000000000000000000000000
^
That is, a single M
is set to 1
, the other are not necessary to set this notion. The sign is arbitrary. Why only a single bit? Because its sufficient.
Double
Now, when we ignore the fact that this is a malformed Double
—a NaN
– and really, really, really want to interpret it as number, what number would we get?
m = 1.5
e = 1024
x = 1.5 * 2 ^ 1024
= 3 * 2 ^ 1024 / 2
= 3 * 2 ^ 1023
And lo and behold, that's exactly the number you get for round (0/0)
:
ghci> round $ 0 / 0
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
ghci> negate $ 3 * 2 ^ 1023
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
Which brings our small adventure to a halt. We have a NaN
, which yields a 2 ^ 1024
, and we have some non-zero mantissa, which yields a result with absolute value between 2 ^ 1024 < x < 2 ^ 1025
.
Note that this isn't the only way NaN
can get represented:
In IEEE 754, NaNs are often represented as floating-point numbers with the exponent emax + 1 and nonzero significands. Implementations are free to put system-dependent information into the significand. Thus there is not a unique NaN, but rather a whole family of NaNs.
For more information, see the classic paper on floating point numbers by Goldberg.
This has long been observed as a problem. Here're a few tickets filed against GHC on this very topic:
Unfortunately, this is a thorny issue with lots of ramifications. My personal belief is that this is a genuine bug and it should be fixed properly by throwing an error. But you can read the comments on these tickets to get an understanding of the tricky issues preventing GHC from implementing a proper solution. Essentially, it comes down to speed vs. correctness, and this is one point where (i) the Haskell report is woefully underspecified, and (ii) GHC compromises the latter for the former.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With