Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A haskell floating point calculation anomaly?

2022 Update: This bug was filed as a GHC ticket and is now fixed: https://gitlab.haskell.org/ghc/ghc/issues/17231 so this is no longer an issue.

Using ghci 8.6.5

I want to calculate the square root of an Integer input, then round it to the bottom and return an Integer.

square :: Integer -> Integer
square m = floor $ sqrt $ fromInteger m

It works. The problem is, for this specific big number as input:

4141414141414141*4141414141414141

I get a wrong result.

Putting my function aside, I test the case in ghci:

> sqrt $ fromInteger $ 4141414141414141*4141414141414141
4.1414141414141405e15

wrong... right?

BUT SIMPLY

> sqrt $ 4141414141414141*4141414141414141
4.141414141414141e15

which is more like what I expect from the calculation...

In my function I have to make some type conversion, and I reckon fromIntegral is the way to go. So, using that, my function gives a wrong result for the 4141...41 input.

I can't figure out what ghci does implicitly in terms of type conversion, right before running sqrt. Because ghci's conversion allows for a correct calculation.

Why I say this is an anomaly: the problem does not occur with other numbers like 5151515151515151 or 3131313131313131 or 4242424242424242 ...

Is this a Haskell bug?

like image 315
Dvinubius Avatar asked Sep 20 '19 21:09

Dvinubius


2 Answers

TLDR

It comes down to how one converts an Integer value to a Double that is not exactly representable. Note that this can happen not just because Integer is too big (or too small), but Float and Double values by design "skip around" integral values as their magnitude gets larger. So, not every integral value in the range is exactly representable either. In this case, an implementation has to pick a value based on the rounding-mode. Unfortunately, there are multiple candidates; and what you are observing is that the candidate picked by Haskell gives you a worse numeric result.

Expected Result

Most languages, including Python, use what's known as "round-to-nearest-ties-to-even" rounding mechanism; which is the default IEEE754 rounding mode and is typically what you would get unless you explicitly set a rounding mode when issuing a floating-point related instruction in a compliant processor. Using Python as the "reference" here, we get:

>>> float(long(4141414141414141)*long(4141414141414141))
1.7151311090705027e+31

I haven't tried in other languages that support so called big-integers, but I'd expect most of them would give you this result.

How Haskell converts Integer to Double

Haskell, however, uses what's known as truncation, or round-towards-zero. So you get:

*Main> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705025e31

Turns out this is a "worse" approximation in this case (cf. to the Python produced value above), and you get the unexpected result in your original example.

The call to sqrt is really red-herring at this point.

Show me the code

It all originates from this code: (https://hackage.haskell.org/package/integer-gmp-1.0.2.0/docs/src/GHC.Integer.Type.html#doubleFromInteger)

doubleFromInteger :: Integer -> Double#
doubleFromInteger (S# m#) = int2Double# m#
doubleFromInteger (Jp# bn@(BN# bn#))
    = c_mpn_get_d bn# (sizeofBigNat# bn) 0#
doubleFromInteger (Jn# bn@(BN# bn#))
    = c_mpn_get_d bn# (negateInt# (sizeofBigNat# bn)) 0#

which in turn calls: (https://github.com/ghc/ghc/blob/master/libraries/integer-gmp/cbits/wrappers.c#L183-L190):

/* Convert bignum to a `double`, truncating if necessary
 * (i.e. rounding towards zero).
 *
 * sign of mp_size_t argument controls sign of converted double
 */
HsDouble
integer_gmp_mpn_get_d (const mp_limb_t sp[], const mp_size_t sn,
                       const HsInt exponent)
{
...

which purposefully says the conversion is done rounding-toward zero.

So, that explains the behavior you get.

Why does Haskell do this?

None of this explains why Haskell uses round-towards-zero for integer-to-double conversion. I'd strongly argue that it should use the default rounding mode, i.e., round-nearest-ties-to-even. I can't find any mention whether this was a conscious choice, and it at least disagrees with what Python does. (Not that I'd consider Python the gold standard, but it does tend to get these things right.)

My best guess is it was just coded that way, without a conscious choice; but perhaps other people familiar with the history of numeric programming in Haskell can remember better.

What to do

Interestingly, I found the following discussion dating all the way back to 2008 as a Python bug: https://bugs.python.org/issue3166. Apparently, Python used to do the wrong thing here as well, but they fixed the behavior. It's hard to track the exact history, but it appears Haskell and Python both made the same mistake; Python recovered, but it went unnoticed in Haskell. If this was a conscious choice, I'd like to know why.

So, that's where it stands. I'd recommend opening a GHC ticket so it can be at least documented properly that this is the "chosen" behavior; or better, fix it so that it uses the default rounding mode instead.

Update:

GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/issues/17231

2022 Update:

This is now fixed in GHC; at least as of GHC 9.2.2; but possibly earlier:

GHCi, version 9.2.2: https://www.haskell.org/ghc/  :? for help
Prelude> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705027e31
like image 103
alias Avatar answered Nov 05 '22 06:11

alias


Not all Integers are exactly representable as Doubles. For those that aren't, fromInteger is in the bad position of needing to make a choice: which Double should it return? I can't find anything in the Report which discusses what to do here, wow!

One obvious solution would be to return a Double that has no fractional part and which represents the integer with the smallest absolute difference from the original of any Double that exists. Unfortunately this appears not to be the decision made by GHC's fromInteger.

Instead, GHC's choice is to return the Double with the largest magnitude that does not exceed the magnitude of the original number. So:

> 17151311090705026844052714160127 :: Double
1.7151311090705025e31
> 17151311090705026844052714160128 :: Double
1.7151311090705027e31

(Don't be fooled by how short the displayed number is in the second one: the Double there is the exact representation of the integer on the line above it; the digits stop there because there are enough to uniquely identify a single Double.)

Why does this matter for you? Well, the true answer to 4141414141414141*4141414141414141 is:

> 4141414141414141*4141414141414141
17151311090705026668707274767881

If fromInteger converted this to the nearest Double, as in plan (1) above, it would choose 1.7151311090705027e31. But since it returns the largest Double less than the input as in plan (2) above, and 17151311090705026844052714160128 is technically bigger, it returns the less accurate representation 1.7151311090705025e31.

Meanwhile, 4141414141414141 itself is exactly representable as a Double, so if you first convert to Double, then square, you get Double's semantics of choosing the representation that is closest to the correct answer, hence plan (1) instead of plan (2).

This explains the discrepancy in sqrt's output: doing your computations in Integer first and getting an exact answer, then converting to Double at the last second, paradoxically is less accurate than converting to Double immediately and doing your computations with rounding the whole way, because of how fromInteger does its conversion! Ouch.

I suspect a patch to modify fromInteger to do something better would be looked on favorably by GHCHQ; in any case I know I would look favorably on it!

like image 6
Daniel Wagner Avatar answered Nov 05 '22 06:11

Daniel Wagner