Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected floating-point representations in Python

Hello I am using a dictionary in Python storing some cities and their population like that:

population = { 'Shanghai' : 17.8, 'Istanbul' : 13.3, 'Karachi' : 13.0, 'mumbai' : 12.5 }

Now if I use the command print population, I get the result:

{'Karachi': 13.0, 'Shanghai': 17.800000000000001, 'Istanbul': 13.300000000000001, 'mumbai': 12.5}

whereas if I use the command print population['Shanghai'] I get the initial input of 17.8.

My question to you is how does the 17.8 and the 13.3 turned into 17.800000000000001 and 13.300000000000001 respectively? How was all that information produced? And why is it stored there, since my initial input denotes that I do not need that extra information, at least as far as I know.

like image 816
NlightNFotis Avatar asked Jan 17 '23 06:01

NlightNFotis


1 Answers

This has been changed in Python 3.1. From the what's new page:

Python now uses David Gay’s algorithm for finding the shortest floating point representation that doesn’t change its value. This should help mitigate some of the confusion surrounding binary floating point numbers.

The significance is easily seen with a number like 1.1 which does not have an exact equivalent in binary floating point. Since there is no exact equivalent, an expression like float('1.1') evaluates to the nearest representable value which is 0x1.199999999999ap+0 in hex or 1.100000000000000088817841970012523233890533447265625 in decimal. That nearest value was and still is used in subsequent floating point calculations.

What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).

The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

The new algorithm tends to emit cleaner representations when possible, but it does not change the underlying values. So, it is still the case that 1.1 + 2.2 != 3.3 even though the representations may suggest otherwise.

The new algorithm depends on certain features in the underlying floating point implementation. If the required features are not found, the old algorithm will continue to be used. Also, the text pickle protocols assure cross-platform portability by using the old algorithm.

(Contributed by Eric Smith and Mark Dickinson; issue 1580)

like image 169
Katriel Avatar answered Jan 18 '23 20:01

Katriel