Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning float as a dictionary key changes its precision (Python)

I have a list of floats (actually it's a pandas Series object, if it changes anything) which looks like this:

mySeries:

...
22      16.0
23      14.0
24      12.0
25      10.0
26       3.1
...

(So elements of this Series are on the right, indices on the left.) Then I'm trying to assign the elements from this Series as keys in a dictionary, and indices as values, like this:

{ mySeries[i]: i for i in mySeries.index }

and I'm getting pretty much what I wanted, except that...

{ 6400.0: 0, 66.0: 13, 3.1000000000000001: 23, 133.0: 10, ... }

Why has 3.1 suddenly changed into 3.1000000000000001? I guess this has something to do with the way the floating point numbers are represented (?) but why does it happen now and how do I avoid/fix it?

EDIT: Please feel free to suggest a better title for this question if it's inaccurate.

EDIT2: Ok, so it seems that it's the exact same number, just printed differently. Still, if I assign mySeries[26] as a dictionary key and then I try to run:

myDict[mySeries[26]]

I get KeyError. What's the best way to avoid it?

like image 400
machaerus Avatar asked Oct 06 '16 17:10

machaerus


People also ask

Can a float be a key in a dictionary Python?

Second, a dictionary key must be of a type that is immutable. For example, you can use an integer, float, string, or Boolean as a dictionary key. However, neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.

How do you increase float precision in Python?

There are many ways to set the precision of the floating-point values. Some of them are discussed below. Using “%”:- “%” operator is used to format as well as set precision in python. This is similar to “printf” statement in C programming.

Can dict keys float?

There's no problem using floats as dict keys. Just round(n, 1) them to normalise them to your keyspace.

Why are floating-point calculation in accurate in Python?

The floating-point calculations are inaccurate because mainly the rationals are approximating that cannot be represented finitely in base 2 and in general they are approximating numbers which may not be representable in finitely many digits in any base.


2 Answers

The dictionary isn't changing the floating point representation of 3.1, but it is actually displaying the full precision. Your print of mySeries[26] is truncating the precision and showing an approximation.

You can prove this:

pd.set_option('precision', 20)

Then view mySeries.

0    16.00000000000000000000
1    14.00000000000000000000
2    12.00000000000000000000
3    10.00000000000000000000
4     3.10000000000000008882
dtype: float64

EDIT:

What every computer programmer should know about floating point arithmetic is always a good read.

EDIT:

Regarding the KeyError, I was not able to replicate the problem.

>> x = pd.Series([16,14,12,10,3.1])
>> a = {x[i]: i for i in x.index}
>> a[x[4]]
4
>> a.keys()
[16.0, 10.0, 3.1000000000000001, 12.0, 14.0]
>> hash(x[4])
2093862195
>> hash(a.keys()[2])
2093862195
like image 135
Logan Byers Avatar answered Oct 12 '22 23:10

Logan Byers


The value is already that way in the Series:

>>> x = pd.Series([16,14,12,10,3.1])
>>> x
0    16.0
1    14.0
2    12.0
3    10.0
4     3.1
dtype: float64
>>> x.iloc[4]
3.1000000000000001

This has to do with floating point precision:

>>> np.float64(3.1)
3.1000000000000001

See Floating point precision in Python array for more information about this.

Concerning the KeyError in your edit, I was not able to reproduce. See the below:

>>> d = {x[i]:i for i in x.index}
>>> d
{16.0: 0, 10.0: 3, 12.0: 2, 14.0: 1, 3.1000000000000001: 4}
>>> x[4]
3.1000000000000001
>>> d[x[4]]
4

My suspicion is that the KeyError is coming from the Series: what is mySeries[26] returning?

like image 23
brianpck Avatar answered Oct 12 '22 23:10

brianpck