Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python , variable store in memory

Tags:

python

a=[1234,1234] #list

a      
[1234, 1234] 

id(a[0])      
38032480

id(a[1])      
38032480

b=1234 #b is a variable of integer type

id(b)      
38032384

Why id(b) is not same as id(a[0]) and id(a[1]) in python ?

like image 791
Aman Tyagi Avatar asked Apr 08 '17 13:04

Aman Tyagi


3 Answers

When the CPython REPL executes a line, it will:

  1. parse, and compile it to a code object of bytecode, and then
  2. execute the bytecode.

The compilation result can be checked through the dis module:

>>> dis.dis('a = [1234, 1234, 5678, 90123, 5678, 4321]')
  1           0 LOAD_CONST               0 (1234)
              2 LOAD_CONST               0 (1234)
              4 LOAD_CONST               1 (5678)
              6 LOAD_CONST               2 (90123)
              8 LOAD_CONST               1 (5678)
             10 LOAD_CONST               3 (4321)
             12 BUILD_LIST               6
             14 STORE_NAME               0 (a)
             16 LOAD_CONST               4 (None)
             18 RETURN_VALUE

Note that all 1234s are loaded with "LOAD_CONST 0", and all 5678s are are loaded with "LOAD_CONST 1". These refer to the constant table associated with the code object. Here, the table is (1234, 5678, 90123, 4321, None).

The compiler knows that all the copies of 1234 in the code object are the same, so will only allocate one object to all of them.

Therefore, as OP observed, a[0] and a[1] do indeed refer to the same object: the same constant from the constant table of the code object of that line of code.

When you execute b = 1234, this will again be compiled and executed, independent of the previous line, so a different object will be allocated.

(You may read http://akaptur.com/blog/categories/python-internals/ for a brief introduction for how code objects are interpreted)


Outside of the REPL, when you execute a *.py file, each function is compiled into separate code objects, so when we run:

a = [1234, 1234]
b = 1234
print(id(a[0]), id(a[1]))
print(id(b))

a = (lambda: [1234, 1234])()
b = (lambda: 1234)()
print(id(a[0]), id(a[1]))
print(id(b))

We may see something like:

4415536880 4415536880
4415536880
4415536912 4415536912
4415537104
  • The first three numbers share the same address 4415536880, and they belong to the constants of the "__main__" code object
  • Then a[0] and a[1] have addresses 4415536912 of the first lambda.
  • The b has address 4415537104 of the second lambda.

Also note that this result is valid for CPython only. Other implementations have different strategies on allocating constants. For instance, running the above code in PyPy gives:

19745 19745
19745
19745 19745
19745
like image 123
kennytm Avatar answered Oct 16 '22 14:10

kennytm


There is no rule or guarantee stating that the id(a[0]) should be equal to the id(a[1]), so the question itself is moot. The question you should be asking is why id(a[0]) and id(a[1]) are in fact the same.
If you do a.append(1234) followed by id(a[2]) you may or may not get the same id. As @hiro protagonist has pointed out, these are just internal optimizations that you shouldn't depend upon.

like image 1
tomatoRadar Avatar answered Oct 16 '22 15:10

tomatoRadar


A Python list is very much unlike a C array.

A C array is just a block of contiguous memory, so the address of its first (0-th) element is the address of the array itself, by definition. Array access in C is just pointer arithmetic, and the [] notation is just a thin crust of syntactic sugar over that pointer arithmetic. An expression int x[] is just another form of int * x.

For the sake of the example, let's assume that in in Python, id(x) is a "memory address of X", as *x would be in C. (This is not true for all Python implementations, and not even guaranteed in CPython. It's just an unique number.)

In C, an int is just an architecture-dependent number of bytes, so for int x = 1 the expression *x points to these bytes. Everything in Python is an object, including numbers. This is why id(1) refers to an object of type int describing number 1. You can call its methods: (1).__str__() will return a string '1'.

So, when you have x = [1, 2, 3], id(x) is a "pointer" to a list object with three elements. The list object itself is pretty complex. But x[0] is not the bytes that comprise the integer value 1; it's internally a reference to an int object for number 1. Thus id(x[0]) is a "pointer" to that object.

In C terms, the elements of the array could be seen as pointers to the objects stored in it, not the objects themselves.

Since there's no point to have two objects representing the same number 1, id(1) is always the same during a Python interpreter run. An illustration:

x = [1, 2, 3]
y = [1, 100, 1000]

assert id(x) != id(y)  # obviously
assert id(x[0]) == id(y[0]) == id(1) # yes, the same int object

CPython actually preallocates objects for a few most-used small numbers (see comments here). For larger numbers, it's not so, which can lead to two 'copies' of a larger number having different id() values.

like image 1
9000 Avatar answered Oct 16 '22 14:10

9000