Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python handle '1 is 1**2' differently from '1000 is 10**3'?

Inspired by this question about caching small integers and strings I discovered the following behavior which I don't understand.

>>> 1000 is 10**3 False 

I thought I understood this behavior: 1000 is to big to be cached. 1000 and 10**3 point to 2 different objects. But I had it wrong:

>>> 1000 is 1000 True 

So, maybe Python treats calculations differently from 'normal' integers. But that assumption is also not correct:

>>> 1 is 1**2 True 

How can this behavior be explained?

like image 483
OrangeTux Avatar asked Feb 19 '14 12:02

OrangeTux


People also ask

What are the 3 types of numbers in Python?

Numeric Types — int , float , complex. There are three distinct numeric types: integers, floating point numbers, and complex numbers.

How many digits can python handle?

The pythonic way Similarly for python, "digit" is in base 2³⁰ which means it will range from 0 to 2³⁰ - 1 = 1073741823 of the decimal system.

What does [:] do in Python?

The [:] makes a shallow copy of the array, hence allowing you to modify your copy without damaging the original. The reason this also works for strings is that in Python, Strings are arrays of bytes representing Unicode characters.


1 Answers

There are two separate things going on here: Python stores int literals (and other literals) as constants with compiled bytecode and small integer objects are cached as singletons.

When you run 1000 is 1000 only one such constant is stored and reused. You are really looking at the same object:

>>> import dis >>> compile('1000 is 1000', '<stdin>', 'eval').co_consts (1000,) >>> dis.dis(compile('1000 is 1000', '<stdin>', 'eval'))   1           0 LOAD_CONST               0 (1000)                3 LOAD_CONST               0 (1000)                6 COMPARE_OP               8 (is)                9 RETURN_VALUE          

Here LOAD_CONST refers to the constant at index 0; you can see the stored constants in the .co_consts attribute of the bytecode object.

Compare this to the 1000 is 10 ** 3 case:

>>> compile('1000 is 10**3', '<stdin>', 'eval').co_consts (1000, 10, 3, 1000) >>> dis.dis(compile('1000 is 10**3', '<stdin>', 'eval'))   1           0 LOAD_CONST               0 (1000)                3 LOAD_CONST               3 (1000)                6 COMPARE_OP               8 (is)                9 RETURN_VALUE          

There is a peephole optimization that pre-computes expressions on constants at compile time, and this optimization has replaced 10 ** 3 with 1000, but the optimization doesn't re-use pre-existing constants. As a result, the LOAD_CONST opcodes are loading two different integer objects, at index 0 and 3, and these are two different int objects.

Then there are optimisations in place where small integers are interned; only one copy of the 1 object is ever created during the lifetime of a Python program; this applies to all integers between -5 and 256.

Thus, for the 1 is 1**2 case, the Python internals use a singleton int() object from the internal cache. This is a CPython implementation detail.

The moral of this story is that you should never use is when you really wanted to compare by value. Use == for integers, always.

like image 101
Martijn Pieters Avatar answered Oct 05 '22 23:10

Martijn Pieters