Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python (strangely) rounding values [duplicate]

This question is more for curiosity.

I'm creating the following array:

A = zeros((2,2))
for i in range(2):
    A[i,i] = 0.6
    A[(i+1)%2,i] = 0.4
print A

>>>
   [[ 0.6  0.4]
   [ 0.4  0.6]]

Then, printing it:

for i,c in enumerate(A):
    for j,d in enumerate(c):
        print j, d

But, if I remove the j, I got:

>>>
0 0.6
1 0.4
0 0.4
1 0.6

But if I remove the j from the for, I got:

(0, 0.59999999999999998)
(1, 0.40000000000000002)
(0, 0.40000000000000002)
(1, 0.59999999999999998)

It because the way I'm creating the matrix, using 0.6? How does it represent internally real values?

like image 310
Pedro Dusso Avatar asked Nov 30 '22 21:11

Pedro Dusso


2 Answers

There are a few different things going on here.

First, Python has two mechanisms for turning an object into a string, called repr and str. repr is supposed to give 'faithful' output that would (ideally) make it easy to recreate exactly that object, while str aims for more human-readable output. For floats in Python versions up to and including Python 3.1, repr gives enough digits to determine the value of the float completely (so that evaluating the returned string gives back exactly that float), while str rounds to 12 decimal places; this has the effect of hiding inaccuracies, but means that two distinct floats that are very close together can end up with the same str value - something that can't happen with repr. When you print an object, you get the str of that object. In contrast, when you just evaluate an expression at the interpreter prompt, you get the repr.

For example (here using Python 2.7):

>>> x = 1.0 / 7.0
>>> str(x)
'0.142857142857'
>>> repr(x)
'0.14285714285714285'
>>> print x  # print uses 'str'
0.142857142857
>>> x  # the interpreter read-eval-print loop uses 'repr'
0.14285714285714285

But also, a little bit confusingly from your point of view, we get:

>>> x = 0.4
>>> str(x)
'0.4'
>>> repr(x)
'0.4'

That doesn't seem to tie in too well with what you were seeing above, but we'll come back to this below.

The second thing to bear in mind is that in your first example, you're printing two separate items, while in your second example (with the j removed), you're printing a single item: a tuple of length 2. Somewhat surprisingly, when converting a tuple for printing with str, Python nevertheless uses repr to compute the string representation of the elements of that tuple:

>>> x = 1.0 / 7.0
>>> print x, x  # print x twice;  uses str(x)
0.142857142857 0.142857142857
>>> print(x, x)  # print a single tuple; uses repr(x)
(0.14285714285714285, 0.14285714285714285)

That explains why you're seeing different results in the two cases, even though the underlying floats are the same.

But there's one last piece to the puzzle. In Python >= 2.7, we saw above that for the particular float 0.4, the str and repr of that float were the same. So where does the 0.40000000000000002 come from? Well, you don't have Python floats here: because you're getting these values from a NumPy array, they're actually of type numpy.float64:

>>> from numpy import zeros
>>> A = zeros((2, 2))
>>> A[:] = [[0.6, 0.4], [0.4, 0.6]]
>>> A
array([[ 0.6,  0.4],
       [ 0.4,  0.6]])
>>> type(A[0, 0])
<type 'numpy.float64'>

That type still stores a double-precision float, just like Python's float, but it's got some extra goodies that make it interact nicely with the rest of NumPy. And it turns out that NumPy uses a slightly different algorithm for computing the repr of a numpy.float64 than Python uses for computing the repr of a float. Python (in versions >= 2.7) aims to give the shortest string that still gives an accurate representation of the float, while NumPy simply outputs a string based on rounding the underlying value to 17 significant digits. Going back to that 0.4 example above, here's what NumPy does:

>>> from numpy import float64
>>> x = float64(1.0 / 7.0)
>>> str(x)
'0.142857142857'
>>> repr(x)
'0.14285714285714285'
>>> x = float64(0.4)
>>> str(x)
'0.4'
>>> repr(x)
'0.40000000000000002'

So these three things together should explain the results you're seeing. Rest assured that this is all completely cosmetic: the underlying floating-point value is not being changed in any way; it's just being displayed differently by the four different possible combinations of str and repr for the two types: float and numpy.float64.

The Python tutorial give more details of how Python floats are stored and displayed, together with some of the potential pitfalls. The answers to this SO question have more information on the difference between str and repr.

like image 187
Mark Dickinson Avatar answered Dec 05 '22 02:12

Mark Dickinson


Edit:

Don't mind me, I failed to realise that the question was about NumPy.


The strange 0.59999999999999998 and friends is Python's best attempt to accurately represent how all computers store floating point values: as a bunch of bits, according to the IEEE 754 standard. Notably, 0.1 is a non-terminating decimal in binary, and so cannot be stored exactly. (So, presumably, are 0.6 and 0.4.)

The reason you normally see 0.6 is most floating-point printing functions round off these imprecisely-stored floats, to make them more understandable to us humans. That's what your first printing example is doing.

Under some circumstances (that is, when the printing functions aren't trying for human-readable), the full, slightly-off number 0.59999999999999998 will be printed. That's what your second printing example is doing.

tl;dr

This is not Python's fault; it is just how floats are stored.

like image 41
michaelb958--GoFundMonica Avatar answered Dec 05 '22 02:12

michaelb958--GoFundMonica