On my Python 2.7.9 on x64 I see the following behavior:
>>> float("10"*(2**28))
inf
>>> float("10"*(2**29))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
>>> float("0"*(2**33))
0.0
>>> float("0." + "0"*(2**32))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Unless there's some deeper rationale I'm missing this violates least surprise. When I got the ValueError on "10"*(2**29)
I figured it was just a limitation on very long strings, but then "0"*(2**33)
worked. What's going on? Can anyone justify why this behavior isn't a POLA bug (if perhaps a relatively irrelevant one)?
Python float values are represented as 64-bit double-precision values. The maximum value any floating-point number can be is approx 1.8 x 10308. Any number greater than this will be indicated by the string inf in Python.
Because the zeros are skipped when inferring the base
I like to look to my favourite reference implementation for questions like this.
The Proof
Casevh has a great intuition in the comments. Here's the relevant code:
for (bits_per_char = -1; n; ++bits_per_char)
n >>= 1;
/* n <- total # of bits needed, while setting p to end-of-string */
while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)
++p;
*str = p;
/* n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */
n = (p - start) * bits_per_char + PyLong_SHIFT - 1;
if (n / bits_per_char < p - start) {
PyErr_SetString(PyExc_ValueError,"long string too large to convert");
return NULL;
Where p
is initially set to the the pointer to your string. If we look at the PyLongDigitValue
table, we see that 0 is explicitly mapped to 0.
Python does a lot of extra work to optimize the conversion of particular bases (there's a fun 200 line comment about converting binary!), that's why it does a lot of work to infer the correct base first. In this case; we can skip over zeros when inferring the base, so they don't count in the overflow calculation.
Indeed, we are checking how many bits are needed to store this float, but python is smart enough to remove leading zeros from this calculation. I don't see anything in the docs of the float function guaranteeing this behaviour across implementations. They ominously state
Convert a string or number to a floating point number, if possible.
When Does this not Work
When you write
float("0." + "0"*(2**32))
It stops parsing for the base early on - all the rest of the zeros are considered in the bit-length calculation, and contribute to raising the ValueError
Similar Parsing Tricks
Here's a similar case in the float class, where we find that whitespace is ignored (and an interesting comment from the authors on their intent with this design choice)
while (Py_ISSPACE(*s))
s++;
/* We don't care about overflow or underflow. If the platform
* supports them, infinities and signed zeroes (on underflow) are
* fine. */
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With