Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does CPython actually do when "=" is performed on primitive type variables?

For instance:

a = some_process_that_generates_integer_result()
b = a

Someone told me that b and a will point to same chunk of integer object, thus b would modify the reference count of that object. The code is executed in function PyObject* ast2obj_expr(void* _o) in Python-ast.c:

static PyObject* ast2obj_object(void *o)
{
    if (!o)
        o = Py_None;
    Py_INCREF((PyObject*)o);
    return (PyObject*)o;
}

......

case Num_kind:
    result = PyType_GenericNew(Num_type, NULL, NULL);
    if (!result) goto failed;
    value = ast2obj_object(o->v.Num.n);
    if (!value) goto failed;
    if (PyObject_SetAttrString(result, "n", value) == -1)
            goto failed;
    Py_DECREF(value);
    break;

However, I think modifying reference count without ownership change is really futile. What I expect is that each variable holding primitive values (floats, integers, etc.) always have their own value, instead of referring to a same object.

And in the execution of my simple test code, I found the break point in the above Num_kind branch is never reached:

def some_function(x, y):
    return (x+y)*(x-y)

a = some_function(666666,66666)
print a

b = a
print a
print b

b = a + 999999
print a
print b

b = a
print a
print b

I'm using the python2.7-dbg program provided by Debian. I'm sure the program and the source code matches, because many other break points works properly.

So, what does CPython actually do on primitive type objects?

like image 870
jiandingzhe Avatar asked Feb 07 '23 19:02

jiandingzhe


1 Answers

First of all, there are no “primitive objects” in Python. Everything is an object, of the same kind, and they are all handled in the same way on the language level. As such, the following assignments work the same way regardless of the values which are assigned:

a = some_process_that_generates_integer_result()
b = a

In Python, assignments are always reference copies. So whatever the function returns, its reference is copied into the variable a. And then in the second line, the reference is again copied into the variable b. As such, both variables will refer to the exact same object.

You can easily verify this by using the id() function which will tell you the identity of an object:

print id(a)
print id(b)

This will print the same identifying number twice. Note though, that wil doing just this, you copied the reference two more times: It’s not variables that are passed to functions but copies of references.

This is different from other languages where you often differentiate between “call by value” and “call by reference”. The former means that you create a copy of the value and pass it to a function, which means that new memory is allocated for that value; the latter means that the actual reference is passed and changes to that reference affect the original variable as well.

What Python does is often called “call by assignment”: every function call where you pass arguments is essentially an assignment into new variables (which are then available to the function). And an assignment copies the reference.

When everything is an object, this is actually a very simple strategy. And as I said above, what happens with integers is then no different to what happens to other objects. The only “special” thing about integers is that they are immutable, so you cannot change their values. This means that an integer object always refers to the exact same value. This makes it easy to share the object (in memory) with multiple values. Every operation that yields a new result gives you a different object, so when you do a series of arithmetic operations, you are actually changing what object a variable is pointing to all the time.

The same happens with other immutable objects too, for example strings. Every operation that yields a changed string gives you a different string object.

Assignments with mutable objects however are the same too. It’s just that changing the value of those objects is possible, so they appear different. Consider this example:

a = [1] # creates a new list object
b = a # copies the reference to that same list object
c = [2] # creates a new list object
b = a + c # concats the two lists and creates a new list object
d = b
# at this point, we have *three* list objects
d.append(3) # mutates the list object
print(d)
print(b) # same result since b and d reference the same list object

Now coming back to your question and the C code you cite there, you are actually looking at the wrong part of CPython to get an explanation there. AST is the abstract syntax tree that the parser creates when parsing a file. It reflects the syntax structure of a program but says nothing about the actual run-time behavior yet.

The code you showed for the Num_kind is actually responsible for creating Num AST objects. You can get an idea of this when using the ast module:

>>> import ast
>>> doc = ast.parse('foo = 5')

# the document contains an assignment
>>> doc.body[0]
<_ast.Assign object at 0x0000000002322278>

# the target of that assignment has the id `foo`
>>> doc.body[0].targets[0].id
'foo'

# and the value of that assignment is the `Num` object that was
# created in that C code, with that `n` property containing the value
>>> doc.body[0].value
<_ast.Num object at 0x00000000023224E0>
>>> doc.body[0].value.n
5

If you want to get an idea of the actual evaluation of Python code, you should first look at the byte code. The byte code is what is being executed at run-time by the virtual machine. You can use the dis module to see byte code for Python code:

>>> def test():
        foo = 5

>>> import dis
>>> dis.dis(test)
  2           0 LOAD_CONST               1 (5)
              3 STORE_FAST               0 (foo)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE

As you can see, there are two major byte code instructions here: LOAD_CONST and STORE_FAST. LOAD_CONST will just load a constant value onto the evaluation stack. In this example, we just load a constant number, but we could also load the value from a function call instead (try playing with the dis module to figure out how it works).

The assignment itself is made using STORE_FAST. The byte code interpreter does the following for that instruction:

TARGET(STORE_FAST)
{
    v = POP();
    SETLOCAL(oparg, v);
    FAST_DISPATCH();
}

So it essentially gets the value (the reference to the integer object) from the stack, and then calls SETLOCAL which essentially will just assign the value to local variable.

Note though, that this does not increase the reference count of that value. That’s what happens with LOAD_CONST, or any other byte code instruction that gets a value from somewhere:

TARGET(LOAD_CONST)
{
    x = GETITEM(consts, oparg);
    Py_INCREF(x);
    PUSH(x);
    FAST_DISPATCH();
}

So tl;dr: Assignments in Python are always reference copies. References are also copied whenever a value is used (but in many other situations that copied reference only exists for a short time). The AST is responsible for creating an object representation of parsed programs (only the syntax), while the byte code interpreter runs the previously compiled byte code to do actual stuff at run-time and deal with real objects.

like image 196
poke Avatar answered Feb 09 '23 08:02

poke