Cached integers, the `is` operator and `id()` in Python 3.7

Question

I used to show something like print(5 is 7 - 2, 300 is 302 - 2) in my Python talks when talking about some Python trivia. Today I realised that this example yields a (to me) unexpected result when ran in Python 3.7.

We know that the numbers from -5 to 255 are cached internally Python 3 docs - PyLong_FromLong which can be found in earlier API docs too.

The is operator (as described in the docs Python 3 docs - is operator) tests the object identity, i.e. it uses the id() function to determine that and yields True when the values match.

The id() function is guaranteed to return a unique and constant value for an object during its lifetime (also described in the docs Python 3 docs - id()).

All these rules gives you the following results (as many Python coders know):

Python 2.7:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True False

Python 3.6:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True False

However, Python 3.7 behaves differently:

>>> print(5 is 7 - 2, 300 is 302 - 2)
True True

I tried to understand why, but I couldn't find any hints in the Python sources yet...

id(302 - 2) always yields a different value, so I am wondering why 302 - 2 is 300 yields True. How does the is operator know that the values are the same? Is this somehow overloaded for integer comparisons in Python 3.7?

>>> id(300)
140059023515344

>>> id(302 - 2)
140059037091600

>>> id(300) is id(302 - 2)
False

>>> 300 is 302 - 2
True

>>> id(300) == id(302 -2)
True

>>> id(302 - 2)
140059037090320

>>> id(302 - 2)
140059023514640

user2357112 supports Monica · Accepted Answer

is hasn't changed. No part of the language semantics have changed; whether the objects you're comparing are the same object was never specified behavior. The two sides of your is comparison simply happen to be the same object now. This is an effect of a change in constant folding optimization.

Initial generation of a code object's co_consts reuses a single object for equivalent atomic constants. (I say "equivalent" instead of "equal" because 1 and 1.0 aren't equivalent.) This is a different effect from the caching of integers from -5 to 256, and it only applies within a single code object. Previously, the compile-time optimization pass that converts 302 - 2 to 300 happened in the bytecode peephole optimizer, which kicks in after initial co_consts generation, and doesn't do the same constant reuse.

In CPython 3.7, this optimization pass was moved from the bytecode peephole optimizer to a new AST optimizer. The AST optimizer takes effect before initial generation of a code object's co_consts, so constant reuse now applies to the results.

You can see the effects of constant reuse on old Python versions by doing something like

>>> 300 is 300
True

which produces True even on CPython 2.7 or 3.6, despite 300 being outside the range of the small integer cache. You can prevent constant reuse by ensuring that the constants you're comparing end up in separate code objects:

>>> (lambda: 300)() is 300
False

This produces False on any version of CPython, even with the new optimizer changes. However, it produces True on PyPy, because PyPy has its own optimization behavior, and PyPy behaves as if all equal integers are represented by the same integer object.

Cached integers, the `is` operator and `id()` in Python 3.7

Tags:

python

python-3.7

tamasgal

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us

Cached integers, the `is` operator and `id()` in Python 3.7

Tags:

python

python-3.7

tamasgal

1 Answers

user2357112 supports Monica

Related questions

Recent Activity

Donate For Us