I have a piece of code that behaves differently depending on whether I go through a dictionary to get conversion factors or whether I use them directly.
The following piece of code will print 1.0 == 1.0 -> False
But if you replace factors[units_from]
with 10.0
and factors[units_to ]
with 1.0 / 2.54
it will print 1.0 == 1.0 -> True
#!/usr/bin/env python
base = 'cm'
factors = {
'cm' : 1.0,
'mm' : 10.0,
'm' : 0.01,
'km' : 1.0e-5,
'in' : 1.0 / 2.54,
'ft' : 1.0 / 2.54 / 12.0,
'yd' : 1.0 / 2.54 / 12.0 / 3.0,
'mile' : 1.0 / 2.54 / 12.0 / 5280,
'lightyear' : 1.0 / 2.54 / 12.0 / 5280 / 5.87849981e12,
}
# convert 25.4 mm to inches
val = 25.4
units_from = 'mm'
units_to = 'in'
base_value = val / factors[units_from]
ret = base_value * factors[units_to ]
print ret, '==', 1.0, '->', ret == 1.0
Let me first say that I am pretty sure what is going on here. I have seen it before in C, just never in Python but since Python in implemented in C we're seeing it.
I know that floating point numbers will change values going from a CPU register to cache and back. I know that comparing what should be two equal variables will return false if one of them was paged out while the other stayed resident in a register.
Questions
Side Note
This is obviously part of a stripped down example but what I'm trying to do is come with with classes of length, volume, etc that can compare against other objects of the same class but with different units.
Rhetorical Questions
As has been shown comparing two floats (or doubles etc) can be problematic. Generally, instead of comparing for exact equality they should be checked against an error bound. If they are within the error bound, they are considered equal.
That is much easier said than done. The nature of floating point make a fixed error bound worthless. A small error bound (like 2*float_epsilon) works well when the values are near 0.0, but will fail if the value are near 1000. An error bound for values as large as 1,000,000.0 will be far too lax for values near 0.0.
The best solution is know the domain of your math and pick an approriate err bound on a case by case basis.
When this is impractical or you're being lazy, Units in the Last Place (ULPs) is a very novel and robust solution. The full details are quite involved, you can read more here.
The basic idea is this, a floating point number has two pieces, mantissa and exponent. Generally rounding errors only change the mantissa by a few steps. When the value is near 0.0 those steps are exactly float_epsilon. When the floating point value is nearer to 1,000,000, the steps will be nearly as large as 1.
Google test uses ULP to compare floating point numbers. They chose a default of 4 ULPs for a two floating point numbers to be compared equal. You could also use their code as reference to build your own ULP style float comparator.
The difference is that if you replace factors[units_to ]
with 1.0 / 2.54
, you're doing:
(base_value * 1.0) / 2.54
With the dictionary, you're doing:
base_value * (1.0 / 2.54)
The order of rounding matters. This is easier to see if you do:
>>> print (((25.4 / 10.0) * 1.0) / 2.54).__repr__()
1.0
>>> print ((25.4 / 10.0) * (1.0 / 2.54)).__repr__()
0.99999999999999989
Note that there is no non-deterministic or undefined behavior. There is a standard, IEEE-754, which implementations must conform to (not to claim they always do).
I don't think there should be an automatic close enough replacement. That is often an effective way to deal with the problem, but it should be up to the programmer to decide if and how to use it.
Finally, there are of course options for arbitrary-precision arithmetic, including python-gmp and decimal. Think whether you actually need these, because they do have a significant performance impact.
There is no issue with moving between regular registers and cache. You may be thinking of the x86's 80-bit extended precision.
Let me first answer by saying that you should read David Goldberg's classic What Every Computer Scientist Should Know About Floating-Point Arithmetic.
As some other commentators already said, the discrepancy you notice is intrinsically due to the floating-point model and has nothing to do with registers, cache or memory.
According to the floating-point model, 2.54 is actually represented as
>>> 2859785763380265 * 2 ** -50
2.54
This representation, however is not exact:
>>> from fractions import Fraction
>>> float(Fraction(2859785763380265, 2 ** 50) - Fraction(254, 100))
3.552713678800501e-17
Now, the expression you are evaluating actually is:
>>> 25.4 / 10 * (1/2.54)
0.99999999999999989
The problem lies in the 1/2.54:
>>> Fraction.from_float(1/2.54)
Fraction(1773070719437203, 4503599627370496)
But what you would expect is
>>> 1/Fraction.from_float(2.54)
Fraction(1125899906842624, 2859785763380265)
To answer your questions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With