2022 Update: This bug was filed as a GHC ticket and is now fixed: https://gitlab.haskell.org/ghc/ghc/issues/17231 so this is no longer an issue.
Using ghci 8.6.5
I want to calculate the square root of an Integer input, then round it to the bottom and return an Integer.
square :: Integer -> Integer
square m = floor $ sqrt $ fromInteger m
It works. The problem is, for this specific big number as input:
4141414141414141*4141414141414141
I get a wrong result.
Putting my function aside, I test the case in ghci:
> sqrt $ fromInteger $ 4141414141414141*4141414141414141
4.1414141414141405e15
wrong... right?
BUT SIMPLY
> sqrt $ 4141414141414141*4141414141414141
4.141414141414141e15
which is more like what I expect from the calculation...
In my function I have to make some type conversion, and I reckon fromIntegral is the way to go. So, using that, my function gives a wrong result for the 4141...41 input.
I can't figure out what ghci does implicitly in terms of type conversion, right before running sqrt. Because ghci's conversion allows for a correct calculation.
Why I say this is an anomaly: the problem does not occur with other numbers like 5151515151515151 or 3131313131313131 or 4242424242424242 ...
Is this a Haskell bug?
It comes down to how one converts an Integer
value to a Double
that is not exactly representable. Note that this can happen not just because Integer
is too big (or too small), but Float
and Double
values by design "skip around" integral values as their magnitude gets larger. So, not every integral value in the range is exactly representable either. In this case, an implementation has to pick a value based on the rounding-mode. Unfortunately, there are multiple candidates; and what you are observing is that the candidate picked by Haskell gives you a worse numeric result.
Most languages, including Python, use what's known as "round-to-nearest-ties-to-even" rounding mechanism; which is the default IEEE754 rounding mode and is typically what you would get unless you explicitly set a rounding mode when issuing a floating-point related instruction in a compliant processor. Using Python as the "reference" here, we get:
>>> float(long(4141414141414141)*long(4141414141414141))
1.7151311090705027e+31
I haven't tried in other languages that support so called big-integers, but I'd expect most of them would give you this result.
Integer
to Double
Haskell, however, uses what's known as truncation, or round-towards-zero. So you get:
*Main> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705025e31
Turns out this is a "worse" approximation in this case (cf. to the Python produced value above), and you get the unexpected result in your original example.
The call to sqrt
is really red-herring at this point.
It all originates from this code: (https://hackage.haskell.org/package/integer-gmp-1.0.2.0/docs/src/GHC.Integer.Type.html#doubleFromInteger)
doubleFromInteger :: Integer -> Double#
doubleFromInteger (S# m#) = int2Double# m#
doubleFromInteger (Jp# bn@(BN# bn#))
= c_mpn_get_d bn# (sizeofBigNat# bn) 0#
doubleFromInteger (Jn# bn@(BN# bn#))
= c_mpn_get_d bn# (negateInt# (sizeofBigNat# bn)) 0#
which in turn calls: (https://github.com/ghc/ghc/blob/master/libraries/integer-gmp/cbits/wrappers.c#L183-L190):
/* Convert bignum to a `double`, truncating if necessary
* (i.e. rounding towards zero).
*
* sign of mp_size_t argument controls sign of converted double
*/
HsDouble
integer_gmp_mpn_get_d (const mp_limb_t sp[], const mp_size_t sn,
const HsInt exponent)
{
...
which purposefully says the conversion is done rounding-toward zero.
So, that explains the behavior you get.
None of this explains why Haskell uses round-towards-zero for integer-to-double conversion. I'd strongly argue that it should use the default rounding mode, i.e., round-nearest-ties-to-even. I can't find any mention whether this was a conscious choice, and it at least disagrees with what Python does. (Not that I'd consider Python the gold standard, but it does tend to get these things right.)
My best guess is it was just coded that way, without a conscious choice; but perhaps other people familiar with the history of numeric programming in Haskell can remember better.
Interestingly, I found the following discussion dating all the way back to 2008 as a Python bug: https://bugs.python.org/issue3166. Apparently, Python used to do the wrong thing here as well, but they fixed the behavior. It's hard to track the exact history, but it appears Haskell and Python both made the same mistake; Python recovered, but it went unnoticed in Haskell. If this was a conscious choice, I'd like to know why.
So, that's where it stands. I'd recommend opening a GHC ticket so it can be at least documented properly that this is the "chosen" behavior; or better, fix it so that it uses the default rounding mode instead.
GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/issues/17231
This is now fixed in GHC; at least as of GHC 9.2.2; but possibly earlier:
GHCi, version 9.2.2: https://www.haskell.org/ghc/ :? for help
Prelude> (fromIntegral $ 4141414141414141*4141414141414141) :: Double
1.7151311090705027e31
Not all Integer
s are exactly representable as Double
s. For those that aren't, fromInteger
is in the bad position of needing to make a choice: which Double
should it return? I can't find anything in the Report which discusses what to do here, wow!
One obvious solution would be to return a Double
that has no fractional part and which represents the integer with the smallest absolute difference from the original of any Double
that exists. Unfortunately this appears not to be the decision made by GHC's fromInteger
.
Instead, GHC's choice is to return the Double
with the largest magnitude that does not exceed the magnitude of the original number. So:
> 17151311090705026844052714160127 :: Double
1.7151311090705025e31
> 17151311090705026844052714160128 :: Double
1.7151311090705027e31
(Don't be fooled by how short the displayed number is in the second one: the Double
there is the exact representation of the integer on the line above it; the digits stop there because there are enough to uniquely identify a single Double
.)
Why does this matter for you? Well, the true answer to 4141414141414141*4141414141414141
is:
> 4141414141414141*4141414141414141
17151311090705026668707274767881
If fromInteger
converted this to the nearest Double
, as in plan (1) above, it would choose 1.7151311090705027e31
. But since it returns the largest Double
less than the input as in plan (2) above, and 17151311090705026844052714160128
is technically bigger, it returns the less accurate representation 1.7151311090705025e31
.
Meanwhile, 4141414141414141
itself is exactly representable as a Double
, so if you first convert to Double
, then square, you get Double
's semantics of choosing the representation that is closest to the correct answer, hence plan (1) instead of plan (2).
This explains the discrepancy in sqrt
's output: doing your computations in Integer
first and getting an exact answer, then converting to Double
at the last second, paradoxically is less accurate than converting to Double
immediately and doing your computations with rounding the whole way, because of how fromInteger
does its conversion! Ouch.
I suspect a patch to modify fromInteger
to do something better would be looked on favorably by GHCHQ; in any case I know I would look favorably on it!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With