Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cached integers, the `is` operator and `id()` in Python 3.7

I used to show something like print(5 is 7 - 2, 300 is 302 - 2) in my Python talks when talking about some Python trivia. Today I realised that this example yields a (to me) unexpected result when ran in Python 3.7.

We know that the numbers from -5 to 255 are cached internally Python 3 docs - PyLong_FromLong which can be found in earlier API docs too.

The is operator (as described in the docs Python 3 docs - is operator) tests the object identity, i.e. it uses the id() function to determine that and yields True when the values match.

The id() function is guaranteed to return a unique and constant value for an object during its lifetime (also described in the docs Python 3 docs - id()).

All these rules gives you the following results (as many Python coders know):

Python 2.7:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True False

Python 3.6:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True False

However, Python 3.7 behaves differently:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True True

I tried to understand why, but I couldn't find any hints in the Python sources yet...

id(302 - 2) always yields a different value, so I am wondering why 302 - 2 is 300 yields True. How does the is operator know that the values are the same? Is this somehow overloaded for integer comparisons in Python 3.7?

>>> id(300)
140059023515344

>>> id(302 - 2)
140059037091600

>>> id(300) is id(302 - 2)
False

>>> 300 is 302 - 2
True

>>> id(300) == id(302 -2)
True

>>> id(302 - 2)
140059037090320

>>> id(302 - 2)
140059023514640
like image 959
tamasgal Avatar asked Apr 02 '19 21:04

tamasgal


1 Answers

is hasn't changed. No part of the language semantics have changed; whether the objects you're comparing are the same object was never specified behavior. The two sides of your is comparison simply happen to be the same object now. This is an effect of a change in constant folding optimization.

Initial generation of a code object's co_consts reuses a single object for equivalent atomic constants. (I say "equivalent" instead of "equal" because 1 and 1.0 aren't equivalent.) This is a different effect from the caching of integers from -5 to 256, and it only applies within a single code object. Previously, the compile-time optimization pass that converts 302 - 2 to 300 happened in the bytecode peephole optimizer, which kicks in after initial co_consts generation, and doesn't do the same constant reuse.

In CPython 3.7, this optimization pass was moved from the bytecode peephole optimizer to a new AST optimizer. The AST optimizer takes effect before initial generation of a code object's co_consts, so constant reuse now applies to the results.


You can see the effects of constant reuse on old Python versions by doing something like

>>> 300 is 300
True

which produces True even on CPython 2.7 or 3.6, despite 300 being outside the range of the small integer cache. You can prevent constant reuse by ensuring that the constants you're comparing end up in separate code objects:

>>> (lambda: 300)() is 300
False

This produces False on any version of CPython, even with the new optimizer changes. However, it produces True on PyPy, because PyPy has its own optimization behavior, and PyPy behaves as if all equal integers are represented by the same integer object.

like image 171
user2357112 supports Monica Avatar answered Nov 16 '22 21:11

user2357112 supports Monica