Why does float.__repr__ return a different representation compared to the equivalent formatting option?

Question

To see how repr(x) works for float in CPython, I checked the source code for float_repr:

buf = PyOS_double_to_string(PyFloat_AS_DOUBLE(v),
                            'r', 0,
                            Py_DTSF_ADD_DOT_0,
                            NULL);

This calls PyOS_double_to_string with format code 'r' which seems to be translated to format code 'g' with precision set to 17:

precision = 17;
format_code = 'g';

So I'd expect repr(x) and f'{x:.17g}' to return the same representation. However this doesn't seem to be the case:

>>> repr(1.1)
'1.1'
>>> f'{1.1:.17g}'
'1.1000000000000001'
>>> 
>>> repr(1.225)
'1.225'
>>> f'{1.225:.17g}'
'1.2250000000000001'

I understand that repr only needs to return as many digits as are necessary to reconstruct the exact same object as represented in memory and hence '1.1' is obviously sufficient to get back 1.1 but I'd like to know how (or why) this differs from the (internally used) .17g formatting option.

(Python 3.7.3)

Jean-François Fabre · Accepted Answer

Seems that you're looking at a fallback method:

/* The fallback code to use if _Py_dg_dtoa is not available. */

PyAPI_FUNC(char *) PyOS_double_to_string(double val,
                                         char format_code,
                                         int precision,
                                         int flags,
                                         int *type)
{
    char format[32];

The preprocessor variable that conditions the fallback method is PY_NO_SHORT_FLOAT_REPR. If it's set then dtoa won't be compiled as it will fail:

/* if PY_NO_SHORT_FLOAT_REPR is defined, then don't even try to compile the following code */

It's probably not the case on most modern setups. This Q&A explains when/why Python selects either method: What causes Python's float_repr_style to use legacy?

now at line 947 you have the version where _Py_dg_dtoa is available

/* _Py_dg_dtoa is available. */


static char *
format_float_short(double d, char format_code,
                   int mode, int precision,
                   int always_add_sign, int add_dot_0_if_integer,
                   int use_alt_formatting, const char * const *float_strings,
                   int *type)

and there you can see that g and r have subtle differences (explained in comments)

We used to convert at 1e17, but that gives odd-looking results for some values when a 16-digit 'shortest' repr is padded with bogus zeros.

case 'g':
    if (decpt <= -4 || decpt >
        (add_dot_0_if_integer ? precision-1 : precision))
        use_exp = 1;
    if (use_alt_formatting)
        vdigits_end = precision;
    break;
case 'r':
    /* convert to exponential format at 1e16.  We used to convert
       at 1e17, but that gives odd-looking results for some values
       when a 16-digit 'shortest' repr is padded with bogus zeros.
       For example, repr(2e16+8) would give 20000000000000010.0;
       the true value is 20000000000000008.0. */
    if (decpt <= -4 || decpt > 16)
        use_exp = 1;
    break;

Seems that it matches the behaviour you're describing. note that "{:.16g}".format(1.225) yields 1.225

Why does float.repr return a different representation compared to the equivalent formatting option?

Tags:

python

floating-point

python-3.x

cpython

a_guest

1 Answers

Jean-François Fabre

Recent Activity

Donate For Us