Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evil in the python decimal / float

I have a large amount of python code that tries to handle numbers with 4 decimal precision and I am stuck with python 2.4 for many reasons. The code does very simplistic math (its a credit management code that takes or add credits mostly)

It has intermingled usage of float and Decimal (MySQLdb returns Decimal objects for SQL DECIMAL types). After several strange bugs coming up from usage, I have found root cause of all to be a few places in the code that float and Decimals are being compared.

I got to cases like this:

>>> from decimal import Decimal
>>> max(Decimal('0.06'), 0.6)
Decimal("0.06")

Now my fear is that I might not be able to catch all such cases in the code. (a normal programmer will keep doing x > 0 instead of x > Decimal('0.0000') and it is very hard to avoid)

I have come up with a patch (inspired by improvements to decimal package in python 2.7).

import decimal
def _convert_other(other):
     """Convert other to Decimal.

     Verifies that it's ok to use in an implicit construction.
     """
     if isinstance(other, Decimal):
         return other
     if isinstance(other, (int, long)):
         return Decimal(other)
     # Our small patch begins
     if isinstance(other, float):
         return Decimal(str(other))
     # Our small patch ends
     return NotImplemented
decimal._convert_other = _convert_other

I just do it in a very early loading library and it will change the decimal package behavior by allowing for float to Decimal conversion before comparisons (to avoid hitting python's default object to object comparison).

I specifically used "str" instead of "repr" as it fixes some of float's rounding cases. E.g.

>>> Decimal(str(0.6))
Decimal("0.6")
>>> Decimal(repr(0.6))
Decimal("0.59999999999999998")

Now my question is: Am I missing anything here? Is this fairly safe? or am I breaking something here? (I am thinking the authors of the package had very strong reasons to avoid floats so much)

like image 939
Ehsan Foroughi Avatar asked Nov 15 '10 08:11

Ehsan Foroughi


People also ask

Is a decimal a float in Python?

Floating point numbers or floats are another Python data type. Floats are decimals, positive, negative and zero. Floats can also be numbers in scientific notation which contain exponents. In Python, a float can be defined using a decimal point .

Should I use decimal or float Python?

For most use cases, I recommend using decimals. If you initialize them with strings, you prevent subtle bugs and get the increased precision benefits. If you need to eliminate all subtle rounding issues, use the fractions module. Even though floats perform better than decimals, I recommend avoiding floats.

Is 0.0 A float in Python?

Float() returns the value based on the argument or parameter value that is being passed to it. If no value or blank parameter is passed, it will return the values 0.0 as the floating-point output. Python float values are represented as 64-bit double-precision values.

What is decimal () in Python?

In Python, there is a module called Decimal, which is used to do some decimal floating point related tasks. This module provides correctly-rounded floating point arithmetic. To use it at first we need to import it the Decimal standard library module. import decimal.


2 Answers

I think you want raise NotImplementedError() instead of return NotImplemented, to start.

What you're doing is called "monkey patching", and is OK to do, so long as you know what you're doing, are aware of the fallout, and are OK with that fallout. Generally you limit this to fixing a bug, or some other change where you know you're alteration of the behavior is still correct and backwards compatible.

In this case, because you're patching a class, you can change behavior outside of the cases where you use it. If another library uses decimal, and somehow relies on the default behavior, it might cause subtle bugs. The trouble is you don't really know unless you audit all your code, including any dependencies, and find all the call sites.

Basically - do it at your own risk.

Personally I find it more reassuring to fix all my code, add tests, and make it harder to do the wrong thing (e.g., use wrapper classes or helper functions). Another approach would be to instrument your code with your patch to find all the call sites, then go back and fix them.

Edit - I guess I should add that the probable reason they avoided floats is floats can't accurately represent all numbers, which is important if you're dealing with money.

like image 152
Richard Levasseur Avatar answered Sep 28 '22 03:09

Richard Levasseur


There are very good reasons to avoid floats. With floats, you cannot reliably do comparisons such as ==, >, < etc. because of floating point noise. With any floating point operation you accumulate noise. It starts with very small digits appearing at the very end, e.g., 1.000...002 but it can eventually accumulate such as 1.0000000453436.

Using str() may work for you if you don't do that many floating point computations, but if you do a lot of computations, the floating point noise will eventually be big enough that str() will give you the wrong answer.

In sum, if (1) you don't do that many floating point computations, or (2) you don't need to do comparisons like ==, >, < etc then you might be ok.

If you want to be sure, then remove all floating point code.

like image 36
gaefan Avatar answered Sep 28 '22 04:09

gaefan