Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

floating point equality in Python and in general

I have a piece of code that behaves differently depending on whether I go through a dictionary to get conversion factors or whether I use them directly.

The following piece of code will print 1.0 == 1.0 -> False

But if you replace factors[units_from] with 10.0 and factors[units_to ] with 1.0 / 2.54 it will print 1.0 == 1.0 -> True

#!/usr/bin/env python

base = 'cm'
factors = {
    'cm'        : 1.0,
    'mm'        : 10.0,
    'm'         : 0.01,
    'km'        : 1.0e-5,
    'in'        : 1.0 / 2.54,
    'ft'        : 1.0 / 2.54 / 12.0,
    'yd'        : 1.0 / 2.54 / 12.0 / 3.0,
    'mile'      : 1.0 / 2.54 / 12.0 / 5280,
    'lightyear' : 1.0 / 2.54 / 12.0 / 5280 / 5.87849981e12,
}

# convert 25.4 mm to inches
val = 25.4
units_from = 'mm'
units_to = 'in'

base_value = val / factors[units_from]
ret = base_value * factors[units_to  ]
print ret, '==', 1.0, '->', ret == 1.0

Let me first say that I am pretty sure what is going on here. I have seen it before in C, just never in Python but since Python in implemented in C we're seeing it.

I know that floating point numbers will change values going from a CPU register to cache and back. I know that comparing what should be two equal variables will return false if one of them was paged out while the other stayed resident in a register.

Questions

  • What is the best way to avoid problems like this?... In Python or in general.
  • Am I doing something completely wrong?

Side Note

This is obviously part of a stripped down example but what I'm trying to do is come with with classes of length, volume, etc that can compare against other objects of the same class but with different units.

Rhetorical Questions

  • If this is a potentially dangerous problem since it makes programs behave in an undetermanistic matter, should compilers warn or error when they detect that you're checking equality of floats
  • Should compilers support an option to replace all float equality checks with a 'close enough' function?
  • Do compilers already do this and I just can't find the information.
like image 685
eric.frederich Avatar asked Jun 15 '10 21:06

eric.frederich


3 Answers

As has been shown comparing two floats (or doubles etc) can be problematic. Generally, instead of comparing for exact equality they should be checked against an error bound. If they are within the error bound, they are considered equal.

That is much easier said than done. The nature of floating point make a fixed error bound worthless. A small error bound (like 2*float_epsilon) works well when the values are near 0.0, but will fail if the value are near 1000. An error bound for values as large as 1,000,000.0 will be far too lax for values near 0.0.

The best solution is know the domain of your math and pick an approriate err bound on a case by case basis.

When this is impractical or you're being lazy, Units in the Last Place (ULPs) is a very novel and robust solution. The full details are quite involved, you can read more here.

The basic idea is this, a floating point number has two pieces, mantissa and exponent. Generally rounding errors only change the mantissa by a few steps. When the value is near 0.0 those steps are exactly float_epsilon. When the floating point value is nearer to 1,000,000, the steps will be nearly as large as 1.

Google test uses ULP to compare floating point numbers. They chose a default of 4 ULPs for a two floating point numbers to be compared equal. You could also use their code as reference to build your own ULP style float comparator.

like image 97
deft_code Avatar answered Oct 17 '22 02:10

deft_code


The difference is that if you replace factors[units_to ] with 1.0 / 2.54, you're doing:

(base_value * 1.0) / 2.54

With the dictionary, you're doing:

base_value * (1.0 / 2.54)

The order of rounding matters. This is easier to see if you do:

>>> print (((25.4 / 10.0) * 1.0) / 2.54).__repr__()
1.0
>>> print ((25.4 / 10.0) * (1.0 / 2.54)).__repr__()
0.99999999999999989

Note that there is no non-deterministic or undefined behavior. There is a standard, IEEE-754, which implementations must conform to (not to claim they always do).

I don't think there should be an automatic close enough replacement. That is often an effective way to deal with the problem, but it should be up to the programmer to decide if and how to use it.

Finally, there are of course options for arbitrary-precision arithmetic, including python-gmp and decimal. Think whether you actually need these, because they do have a significant performance impact.

There is no issue with moving between regular registers and cache. You may be thinking of the x86's 80-bit extended precision.

like image 41
Matthew Flaschen Avatar answered Oct 17 '22 02:10

Matthew Flaschen


Let me first answer by saying that you should read David Goldberg's classic What Every Computer Scientist Should Know About Floating-Point Arithmetic.

As some other commentators already said, the discrepancy you notice is intrinsically due to the floating-point model and has nothing to do with registers, cache or memory.

According to the floating-point model, 2.54 is actually represented as

>>> 2859785763380265 * 2 ** -50
2.54

This representation, however is not exact:

>>> from fractions import Fraction
>>> float(Fraction(2859785763380265, 2 ** 50) - Fraction(254, 100))
3.552713678800501e-17

Now, the expression you are evaluating actually is:

>>> 25.4 / 10 * (1/2.54)
0.99999999999999989

The problem lies in the 1/2.54:

>>> Fraction.from_float(1/2.54)
Fraction(1773070719437203, 4503599627370496)

But what you would expect is

>>> 1/Fraction.from_float(2.54)
Fraction(1125899906842624, 2859785763380265)

To answer your questions:

  • It is a difficult problem, but is clearly deterministic, nothing mysterious there.
  • You cannot automatically replace equality with a close-enough comparison. The latter requires that you specify a tolerance, which depends on the problem at hand, i.e. on what kind of precision you are expecting from your results. There are also plenty of situations where you really want equality not a close-enough comparison.
like image 4
krawyoti Avatar answered Oct 17 '22 02:10

krawyoti