Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The math behind 1.0999999999999999 in Haskell [duplicate]

Possible Duplicate:
Haskell ranges and floats

Why does the following output occur in haskel:

[0.1,0.3..1]
[0.1,0.3,0.5,0.7,0.8999999999999999,1.0999999999999999]
  1. What is the math behind the 1.0999999999999999 (I am on a 64bit linux machine if its useful)?
  2. Why doesnt it stop at 0.8999999999999999 when obviously 1.0999999999999999 is out of range?
like image 612
footy Avatar asked Nov 02 '12 21:11

footy


2 Answers

Why the overshoot?

[0.1,0.3..1] is short for enumFromThenTo 0.1 0.3 1.0

The Haskell report says

For Float and Double, the semantics of the enumFrom family is given by the rules for Int above, except that the list terminates when the elements become greater than e3 + i∕2 for positive increment i, or when they become less than e3 + i∕2 for negative i.

Here e3 = 1.0 and your increment i = 0.2, so e3 + i∕2 = 1.1. It's only supposed to stop when it goes bigger than that.

You asked it to stop at 1, but it can only stop at 0.9 or 1.1. There's a rounding error (floating types are inherently inaccurate) and 1.1 has ended up as 1.09999999999, so since this is not greater than 1.0 + i/2, it's allowed.

In fact, even if it were equal to 1.0+i/2 it's allowed, as you can check using the exact [0.1,0.3..1]::[Rational] (after importing Data.Ratio).

You can avoid the problem by calculating the upper limit you're aiming for, 0.9, and specifying that: [0.1,0.3..0.9]. You won't suffer from rounding error unless your increment is small and your numbers are large, i.e. you're working beyond the accuracy of Double for large numbers.

Why the inaccuracy?

1.09 recurring is mathematically indistinguishable from 1.1, but here we have a finite number of 9s, and that's strictly less than 1.1.

Floating point numbers are stored as if they're in scientific notation, eg 4.563347x10^-7, but in binary, so like 01.1001110101x2^01101110.

This means your number can only be stored completely accurately as a Float if you can express it by summing powers of two, just as you can only write a number in decimal if you can express is by summing powers of 10.

0.2 in your example is 0.001100110011 in binary, with the 0011 repeating forever and 1.1 is 1.0001100110011 again with 0011 repeating forever.

Since only a finite part of those will be stored, when converted back to decimal to show you, they'll be a little out. Often the difference is so small it gets rounded away again, but sometimes you can see it, as here.

This inherent inaccuracy is why enumFromThenTo lets you go above the top number - it's stopping you from having too few because of rounding errors.

like image 57
AndrewC Avatar answered Nov 07 '22 23:11

AndrewC


The simple answer

To understand this behaviour, you need to know that the expression [a,b..c] will be desugared into enumFromThenTo a b c where enumFromThenTo is a method of the Enum class.

The Haskell standard says that

For Float and Double, the semantics of the enumFrom family is given by the rules for Int above, except that the list terminates when the elements become greater than e3 + i∕2 for positive increment i, or when they become less than e3 + i∕2 for negative i.

Standards are standards, after all. But that's not very satisfactory.

Going deeper

The Double instance of Enum is defined in the module GHC.Float, so let's look there. We find:

instance Enum Double where
  enumFromThenTo = numericFromThenTo

That's not incredibly helpful, but a quick Google search reveals that numericFromThenTo is defined in GHC.Real, so let's go there:

numericEnumFromThenTo e1 e2 e3 = takeWhile pred (numericEnumFromThen e1 e2)
                                where
                                 mid = (e2 - e1) / 2
                                 pred | e2 >= e1  = (<= e3 + mid)
                                      | otherwise = (>= e3 + mid)

That's a bit better. If we assume a sensible definition of numericEnumFromThen, then calling

numericEnumFromThenTo 0.1 0.3 1.0

will result in

takeWhile pred [0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 1.3 ...]

Since e2 > e1, the definition of pred is

pred = (<= e3 + mid)
  where
    mid = (e2 - e1) / 2

Therefore we will take elements (call them x) from this list as long as they satisfy x <= e3 + mid. Let's ask GHCi what that value is:

>> let (e1, e2, e3) = (0.1, 0.3, 1.0)
>> let mid = (e2 - e1) / 2
>> e3 + mid
1.1

That's why you see 1.09999... in the list of results.

The reason that you see 1.0999... instead of 1.1 is because 1.1 is not representable exactly in binary.

The reasoning

Why would the standard prescribe such bizarre behaviour? Well, consider what might happen if you only took numbers that satisfied (<= e3). Because of floating point error or nonrepresentability, e3 may never appear in the list of generated numbers at all, which could mean that innocuous expressions like

[0.0,0.02 .. 0.1]

would result in

[0.0, 0.02, 0.04, 0.06, 0.08]

which seems a little bizarre. Because of the correction in numericFromThenTo, we make sure that we get the expected result for this (presumably more common) use case.

like image 41
Chris Taylor Avatar answered Nov 07 '22 22:11

Chris Taylor