I used to show something like print(5 is 7 - 2, 300 is 302 - 2)
in my Python talks when talking about some Python trivia. Today I realised that this example yields a (to me) unexpected result when ran in Python 3.7.
We know that the numbers from -5 to 255 are cached internally Python 3 docs - PyLong_FromLong which can be found in earlier API docs too.
The is
operator (as described in the docs Python 3 docs - is operator) tests the object identity, i.e. it uses the id()
function to determine that and yields True
when the values match.
The id()
function is guaranteed to return a unique and constant value for an object during its lifetime (also described in the docs Python 3 docs - id()).
All these rules gives you the following results (as many Python coders know):
Python 2.7:
>>> print(5 is 7 - 2, 300 is 302 - 2)
True False
Python 3.6:
>>> print(5 is 7 - 2, 300 is 302 - 2)
True False
However, Python 3.7 behaves differently:
>>> print(5 is 7 - 2, 300 is 302 - 2)
True True
I tried to understand why, but I couldn't find any hints in the Python sources yet...
id(302 - 2)
always yields a different value, so I am wondering why 302 - 2 is 300
yields True
. How does the is
operator know that the values are the same? Is this somehow overloaded for integer comparisons in Python 3.7?
>>> id(300)
140059023515344
>>> id(302 - 2)
140059037091600
>>> id(300) is id(302 - 2)
False
>>> 300 is 302 - 2
True
>>> id(300) == id(302 -2)
True
>>> id(302 - 2)
140059037090320
>>> id(302 - 2)
140059023514640
is
hasn't changed. No part of the language semantics have changed; whether the objects you're comparing are the same object was never specified behavior. The two sides of your is
comparison simply happen to be the same object now. This is an effect of a change in constant folding optimization.
Initial generation of a code object's co_consts
reuses a single object for equivalent atomic constants. (I say "equivalent" instead of "equal" because 1 and 1.0 aren't equivalent.) This is a different effect from the caching of integers from -5 to 256, and it only applies within a single code object. Previously, the compile-time optimization pass that converts 302 - 2
to 300
happened in the bytecode peephole optimizer, which kicks in after initial co_consts
generation, and doesn't do the same constant reuse.
In CPython 3.7, this optimization pass was moved from the bytecode peephole optimizer to a new AST optimizer. The AST optimizer takes effect before initial generation of a code object's co_consts
, so constant reuse now applies to the results.
You can see the effects of constant reuse on old Python versions by doing something like
>>> 300 is 300
True
which produces True
even on CPython 2.7 or 3.6, despite 300 being outside the range of the small integer cache. You can prevent constant reuse by ensuring that the constants you're comparing end up in separate code objects:
>>> (lambda: 300)() is 300
False
This produces False
on any version of CPython, even with the new optimizer changes. However, it produces True
on PyPy, because PyPy has its own optimization behavior, and PyPy behaves as if all equal integers are represented by the same integer object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With