Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python's float raise ValueError for some very long inputs?

Tags:

python

On my Python 2.7.9 on x64 I see the following behavior:

>>> float("10"*(2**28))
inf
>>> float("10"*(2**29))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
>>> float("0"*(2**33))
0.0
>>> float("0." + "0"*(2**32))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Unless there's some deeper rationale I'm missing this violates least surprise. When I got the ValueError on "10"*(2**29) I figured it was just a limitation on very long strings, but then "0"*(2**33) worked. What's going on? Can anyone justify why this behavior isn't a POLA bug (if perhaps a relatively irrelevant one)?

like image 554
FakeName123 Avatar asked Jun 21 '16 02:06

FakeName123


People also ask

What is the maximum float value in Python?

Python float values are represented as 64-bit double-precision values. The maximum value any floating-point number can be is approx 1.8 x 10308. Any number greater than this will be indicated by the string inf in Python.


1 Answers

Because the zeros are skipped when inferring the base

I like to look to my favourite reference implementation for questions like this.


The Proof

Casevh has a great intuition in the comments. Here's the relevant code:

for (bits_per_char = -1; n; ++bits_per_char)
    n >>= 1;

/* n <- total # of bits needed, while setting p to end-of-string */
while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)
    ++p;
*str = p;

/* n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */
n = (p - start) * bits_per_char + PyLong_SHIFT - 1;
if (n / bits_per_char < p - start) {
    PyErr_SetString(PyExc_ValueError,"long string too large to convert");
    return NULL;

Where p is initially set to the the pointer to your string. If we look at the PyLongDigitValue table, we see that 0 is explicitly mapped to 0.

Python does a lot of extra work to optimize the conversion of particular bases (there's a fun 200 line comment about converting binary!), that's why it does a lot of work to infer the correct base first. In this case; we can skip over zeros when inferring the base, so they don't count in the overflow calculation.

Indeed, we are checking how many bits are needed to store this float, but python is smart enough to remove leading zeros from this calculation. I don't see anything in the docs of the float function guaranteeing this behaviour across implementations. They ominously state

Convert a string or number to a floating point number, if possible.


When Does this not Work

When you write

   float("0." + "0"*(2**32))

It stops parsing for the base early on - all the rest of the zeros are considered in the bit-length calculation, and contribute to raising the ValueError


Similar Parsing Tricks

Here's a similar case in the float class, where we find that whitespace is ignored (and an interesting comment from the authors on their intent with this design choice)

while (Py_ISSPACE(*s))    
    s++;

/* We don't care about overflow or underflow.  If the platform
 * supports them, infinities and signed zeroes (on underflow) are    
 * fine. */
like image 134
en_Knight Avatar answered Oct 10 '22 21:10

en_Knight