Why does Python's float raise ValueError for some very long inputs?

Tags:

python

On my Python 2.7.9 on x64 I see the following behavior:

>>> float("10"*(2**28))
inf
>>> float("10"*(2**29))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
>>> float("0"*(2**33))
0.0
>>> float("0." + "0"*(2**32))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Unless there's some deeper rationale I'm missing this violates least surprise. When I got the ValueError on "10"*(2**29) I figured it was just a limitation on very long strings, but then "0"*(2**33) worked. What's going on? Can anyone justify why this behavior isn't a POLA bug (if perhaps a relatively irrelevant one)?

554

asked Jun 21 '16 02:06

FakeName123

1 Answers

Because the zeros are skipped when inferring the base

I like to look to my favourite reference implementation for questions like this.

The Proof

Casevh has a great intuition in the comments. Here's the relevant code:

for (bits_per_char = -1; n; ++bits_per_char)
    n >>= 1;

/* n <- total # of bits needed, while setting p to end-of-string */
while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)
    ++p;
*str = p;

/* n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */
n = (p - start) * bits_per_char + PyLong_SHIFT - 1;
if (n / bits_per_char < p - start) {
    PyErr_SetString(PyExc_ValueError,"long string too large to convert");
    return NULL;

Where p is initially set to the the pointer to your string. If we look at the PyLongDigitValue table, we see that 0 is explicitly mapped to 0.

Python does a lot of extra work to optimize the conversion of particular bases (there's a fun 200 line comment about converting binary!), that's why it does a lot of work to infer the correct base first. In this case; we can skip over zeros when inferring the base, so they don't count in the overflow calculation.

Indeed, we are checking how many bits are needed to store this float, but python is smart enough to remove leading zeros from this calculation. I don't see anything in the docs of the float function guaranteeing this behaviour across implementations. They ominously state

Convert a string or number to a floating point number, if possible.

When Does this not Work

When you write

   float("0." + "0"*(2**32))

It stops parsing for the base early on - all the rest of the zeros are considered in the bit-length calculation, and contribute to raising the ValueError

Similar Parsing Tricks

Here's a similar case in the float class, where we find that whitespace is ignored (and an interesting comment from the authors on their intent with this design choice)

while (Py_ISSPACE(*s))    
    s++;

/* We don't care about overflow or underflow.  If the platform
 * supports them, infinities and signed zeroes (on underflow) are    
 * fine. */

134

answered Oct 10 '22 21:10

en_Knight

Related questions
                            
                                Using f-score in xgb
                            
                                "canonical" way to use logging for Python asserts
                            
                                Expressing pandas subset using pipe
                            
                                Linear Regression with positive coefficients in Python
                            
                                Theano: Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY
                            
                                What is the best way to top k pool elements instead of only the max one in Tensorflow?
                            
                                How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?
                            
                                Remove interpolation Time series plot for missing values
                            
                                Executing `from abc import xyz` where does the module `abc` go?
                            
                                Python Pandas: Convert 2,000,000 DataFrame rows to Binary Matrix (pd.get_dummies()) without memory error?
                            
                                How to get the Worksheet ID from a Google Spreadsheet with python?
                            
                                Pandas str.replace of pipe character not working?
                            
                                Getting TF-IDF Scores Of Words Using Gensim
                            
                                Twisted logic error
                            
                                DeprecationWarning in sklearn MiniBatchKMeans
                            
                                Adding information to JWT token body using django rest framework jwt
                            
                                Google App Engine custom 404 page for static files
                            
                                Apply custom cumulative function to pandas dataframe
                            
                                How to dynamically change depth in Django Rest Framework nested serializers?
                            
                                Testing a POST that uses Flask-WTF validate_on_submit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With