Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Round NaN in Haskell

To my great surprise, I found that rounding a NaN value in Haskell returns a gigantic negative number:

round (0/0)
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824

The same thing happens with floor and ceiling.

What is happening here? Is this behavior intended? Of course, I understand that anyone who doesn't want this behavior can always write another function that checks isNaN - but are there existing alternative standard library functions that handle NaN more sanely (for some definition of "more sanely")?

like image 900
echinodermata Avatar asked Jun 06 '17 00:06

echinodermata


2 Answers

TL;DR: NaN have an arbitrary representation between 2 ^ 1024 and 2 ^ 1025 (bounds not included), and - 1.5 * 2 ^ 1024 (which is one possible) NaN happens to be the one you hit.


Why any reasoning is off

What is happening here?

You're entering the region of undefined behaviour. Or at least that is what you would call it in some other languages. The report defines round as follows:

6.4.6 Coercions and Component Extraction

The ceiling, floor, truncate, and round functions each take a real fractional argument and return an integral result. … round x returns the nearest integer to x, the even integer if x is equidistant between two integers.

In our case x does not represent a number to begin with. According to 6.4.6, y = round x should fulfil that any other z from round's codomain has an equal or greater distance:

y = round x ⇒ ∀z : dist(z,x) >= dist(y,x)

However, the distance (aka the subtraction) of numbers is defined only for, well, numbers. If we used

dist n d = fromIntegral n - d

we get in trouble soon: any operation that includes NaN will return NaN again, and comparisons on NaN fail, so our property above does not hold for any z if x was a NaN to begin with. If we check for NaN, we can return any value, but then our property holds for all pairs:

dist n d = if isNaN d then constant else fromIntegral n - d

So we're completely arbitrary in what round x shall return if x was not a number.

Why do we get that large number regardless?

"OK", I hear you say, "that's all fine and dandy, but why do I get that number?" That's a good question.

Is this behavior intended?

Somewhat. It isn't really intended, but to be expected. First of all, we have to know how Double works.

IEE 754 double precision floating point numbers

A Double in Haskell is usually a IEEE 754 compliant double precision floating point number, that is a number that has 64 bits and is represented with

x = s * m * (b ^ e)

where s is a single bit, m is the mantissa (52 bits) and e is the exponent (11 bits, floatRange). b is the base, and its usually 2 (you can check with floadRadix). Since the value of m is normalized, every well-formed Double has a unique representation.

IEEE 754 NaN

Except NaN. NaN is represented as the emax+1, as well as a non-zero mantissa. So if the bitfield

SEEEEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

represents a Double, what's a valid way to represent NaN?

?111111111111000000000000000000000000000000000000000000000000000
            ^

That is, a single M is set to 1, the other are not necessary to set this notion. The sign is arbitrary. Why only a single bit? Because its sufficient.

Interpret NaN as Double

Now, when we ignore the fact that this is a malformed Double—a NaN– and really, really, really want to interpret it as number, what number would we get?

m = 1.5
e = 1024

x = 1.5 * 2 ^ 1024
  = 3 * 2 ^ 1024 / 2
  = 3 * 2 ^ 1023

And lo and behold, that's exactly the number you get for round (0/0):

ghci> round $ 0 / 0
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
ghci> negate $ 3 * 2 ^ 1023
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824

Which brings our small adventure to a halt. We have a NaN, which yields a 2 ^ 1024, and we have some non-zero mantissa, which yields a result with absolute value between 2 ^ 1024 < x < 2 ^ 1025.

Note that this isn't the only way NaN can get represented:

In IEEE 754, NaNs are often represented as floating-point numbers with the exponent emax + 1 and nonzero significands. Implementations are free to put system-dependent information into the significand. Thus there is not a unique NaN, but rather a whole family of NaNs.

For more information, see the classic paper on floating point numbers by Goldberg.

like image 177
Zeta Avatar answered Nov 18 '22 21:11

Zeta


This has long been observed as a problem. Here're a few tickets filed against GHC on this very topic:

  • https://ghc.haskell.org/trac/ghc/ticket/3070
  • https://ghc.haskell.org/trac/ghc/ticket/11553
  • https://ghc.haskell.org/trac/ghc/ticket/3676

Unfortunately, this is a thorny issue with lots of ramifications. My personal belief is that this is a genuine bug and it should be fixed properly by throwing an error. But you can read the comments on these tickets to get an understanding of the tricky issues preventing GHC from implementing a proper solution. Essentially, it comes down to speed vs. correctness, and this is one point where (i) the Haskell report is woefully underspecified, and (ii) GHC compromises the latter for the former.

like image 2
alias Avatar answered Nov 18 '22 23:11

alias