a=[1234,1234] #list
a
[1234, 1234]
id(a[0])
38032480
id(a[1])
38032480
b=1234 #b is a variable of integer type
id(b)
38032384
Why id(b) is not same as id(a[0]) and id(a[1]) in python ?
When the CPython REPL executes a line, it will:
The compilation result can be checked through the dis
module:
>>> dis.dis('a = [1234, 1234, 5678, 90123, 5678, 4321]')
1 0 LOAD_CONST 0 (1234)
2 LOAD_CONST 0 (1234)
4 LOAD_CONST 1 (5678)
6 LOAD_CONST 2 (90123)
8 LOAD_CONST 1 (5678)
10 LOAD_CONST 3 (4321)
12 BUILD_LIST 6
14 STORE_NAME 0 (a)
16 LOAD_CONST 4 (None)
18 RETURN_VALUE
Note that all 1234s are loaded with "LOAD_CONST 0
", and all 5678s are are loaded with "LOAD_CONST 1
". These refer to the constant table associated with the code object. Here, the table is (1234, 5678, 90123, 4321, None)
.
The compiler knows that all the copies of 1234 in the code object are the same, so will only allocate one object to all of them.
Therefore, as OP observed, a[0]
and a[1]
do indeed refer to the same object: the same constant from the constant table of the code object of that line of code.
When you execute b = 1234
, this will again be compiled and executed, independent of the previous line, so a different object will be allocated.
(You may read http://akaptur.com/blog/categories/python-internals/ for a brief introduction for how code objects are interpreted)
Outside of the REPL, when you execute a *.py
file, each function is compiled into separate code objects, so when we run:
a = [1234, 1234]
b = 1234
print(id(a[0]), id(a[1]))
print(id(b))
a = (lambda: [1234, 1234])()
b = (lambda: 1234)()
print(id(a[0]), id(a[1]))
print(id(b))
We may see something like:
4415536880 4415536880
4415536880
4415536912 4415536912
4415537104
a[0]
and a[1]
have addresses 4415536912 of the first lambda.b
has address 4415537104 of the second lambda.Also note that this result is valid for CPython only. Other implementations have different strategies on allocating constants. For instance, running the above code in PyPy gives:
19745 19745
19745
19745 19745
19745
There is no rule or guarantee stating that the id(a[0]) should be equal to the id(a[1]), so the question itself is moot. The question you should be asking is why id(a[0])
and id(a[1])
are in fact the same.
If you do a.append(1234)
followed by id(a[2])
you may or may not get the same id. As @hiro protagonist
has pointed out, these are just internal optimizations that you shouldn't depend upon.
A Python list is very much unlike a C array.
A C array is just a block of contiguous memory, so the address of its first (0-th) element is the address of the array itself, by definition. Array access in C is just pointer arithmetic, and the []
notation is just a thin crust of syntactic sugar over that pointer arithmetic. An expression int x[]
is just another form of int * x
.
For the sake of the example, let's assume that in in Python, id(x)
is a "memory address of X", as *x
would be in C. (This is not true for all Python implementations, and not even guaranteed in CPython. It's just an unique number.)
In C, an int
is just an architecture-dependent number of bytes, so for int x = 1
the expression *x
points to these bytes.
Everything in Python is an object, including numbers. This is why id(1)
refers to an object of type int
describing number 1
. You can call its methods: (1).__str__()
will return a string '1'
.
So, when you have x = [1, 2, 3]
, id(x)
is a "pointer" to a list
object with three elements. The list
object itself is pretty complex. But x[0]
is not the bytes that comprise the integer value 1; it's internally a reference to an int
object for number 1. Thus id(x[0])
is a "pointer" to that object.
In C terms, the elements of the array could be seen as pointers to the objects stored in it, not the objects themselves.
Since there's no point to have two objects representing the same number 1, id(1)
is always the same during a Python interpreter run. An illustration:
x = [1, 2, 3]
y = [1, 100, 1000]
assert id(x) != id(y) # obviously
assert id(x[0]) == id(y[0]) == id(1) # yes, the same int object
CPython actually preallocates objects for a few most-used small numbers (see comments here). For larger numbers, it's not so, which can lead to two 'copies' of a larger number having different id()
values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With