Presumably both mylist.reverse()
and list.reverse(mylist)
end up executing reverse_slice
in listobject.c
via list_reverse_impl
or PyList_Reverse
. But how do they actually get there? What are the paths from the Python expressions to the C code in that C file? What connects them? And which of those two reverse functions (if any) do they go through?
Update for the bounty: Dimitris' answer (Update 2: I meant the original version, before it was expanded now) and the comments under it explain parts, but I'm still missing a few things and would like to see a comprehensive answer.
LOAD_METHOD
, would clarify this. (As somewhat done by the comments under Dimitris' answer.)list_reverse
function in the listobject.c.h
file? I don't think the Python interpreter is like "let's look for a file that sounds similar and for a function that sounds similar". I rather suspect that the list
type is defined somewhere and is somehow "registered" under the name "list
", and that the reverse
function is "registered" under the name "reverse
" (maybe that's what the LIST_REVERSE_METHODDEF
macro does?).call_function
). Really what interests me here is what I said originally, the path from the Python expressions to that C code in that C file. And preferably how I can find such paths in general.To explain my motivation: For another question I wanted to know what C code does the work when I call list.reverse(mylist)
. I was fairly confident I had found it by browsing around and searching for the names. But I want to be more certain and generally understand the connections better.
PyList_Reverse
is part of the C-API, you'd call it if you were manipulating Python lists in C, it isn't used in any of the two cases.
These both go through list_reverse_impl
(actually list_reverse
which wraps list_reverse_impl
) which is the C function that implements both list.reverse
and list_instance.reverse
.
Both calls are handled by call_function
in ceval
, getting there after the CALL_METHOD
opcode generated for them is executed (dis.dis
the statements to see it). call_function
has gone under a good deal of changes in Python 3.8 (with the introduction of PEP 590) so what happens from there on is probably too big of a subject to get into in a single question.
Additional Questions:
How do the two paths from the two python expressions converge? If I understand things correctly, disassembling and discussing the byte code and what happens to the stack, particularly
LOAD_METHOD
, would clarify this.
Let's start after both expressions compile to their respective bytecode representations:
l = [1, 2, 3, 4]
Case A, for l.reverse()
we have:
1 0 LOAD_NAME 0 (l)
2 LOAD_METHOD 1 (reverse)
4 CALL_METHOD 0
6 RETURN_VALUE
Case B, for list.reverse(l)
we have:
1 0 LOAD_NAME 0 (list)
2 LOAD_METHOD 1 (reverse)
4 LOAD_NAME 2 (l)
6 CALL_METHOD 1
8 RETURN_VALUE
We can safely ignore the RETURN_VALUE
opcode, doesn't really matter here.
Let's focus on the individual implementations for each op code, namely, LOAD_NAME
, LOAD_METHOD
and CALL_METHOD
. We can see what gets pushed onto the value stack by viewing what operations are called on it. (Note, it is initialized to point to the value stack located inside the frame object for each expression.)
LOAD_NAME
:
What is performed in this case is pretty straight-forward. Given our name, l
or list
in each case, (each name is found in `co->co_names, a tuple that stores the names we use inside the code object) the steps are:
locals
. If found, go to 4.globals
. If found, go to 4.builtins
. If found, go to 4.In case A, the name l
is found in the globals. In case B, it is found in the builtins. So, after the LOAD_NAME
, the stack looks like:
Case A: stack_pointer -> [1, 2, 3, 4]
Case B: stack_pointer -> <type list>
LOAD_METHOD
:
First, I should not that this opcode is generated only if an attribute access is performed (i.e obj.attr
). You could also grab a method and call it via a = obj.attr
and then a()
but that would result in a CALL_FUNCTION
opcode generated (see further down for a bit more).
After loading the name of the callable (reverse
in both cases) we search the object on the top of the stack (either [1, 2, 3, 4]
or list
) for a method named reverse
. This is done with _PyObject_GetMethod
, its documentation states:
Return 1 if a method is found, 0 if it's a regular attribute from
__dict__
or something returned by using a descriptor protocol.
A method is only found in Case A, when we access the attribute (reverse
) through an instance of the list object. In case B, the callable is returned after the descriptor protocol is invoked, so the return value is 0 (but we do of-course get the object back!).
Here we diverge on the value returned:
In Case A:
SET_TOP(meth);
PUSH(obj); // self
We have a SET_TOP
followed by a PUSH
. We moved the method to the top of the stack and then we push the value again. In this case, stack_pointer
now looks:
stack_pointer -> [1, 2, 3, 4]
<reverse method of lists>
In Case B we have:
SET_TOP(NULL);
Py_DECREF(obj);
PUSH(meth);
Again a SET_TOP
followed by a PUSH
. The reference count of obj
(i.e list
) is decreased because, as far as I can tell, it isn't really needed anymore. In this case, the stack now looks like so:
stack_pointer -> <reverse method of lists>
NULL
For case B, we have an additional LOAD_NAME
. Following the previous steps, the stack for Case B now becomes:
stack_pointer -> [1, 2, 3, 4]
<reverse method of lists>
NULL
Pretty similar.
CALL_METHOD
:
This doesn't make any modifications to the stack. Both cases result in a call to call_function
passing the thread state, the stack pointer and the number of positional arguments (oparg
).
The only difference is in the expression used to pass the positional arguments.
For case A we need to account for the implicit self
that should be inserted as the first positional argument. Since the op code generated for it doesn't signal that a positional argument has been passed (because none have been explicitly passed):
4 CALL_METHOD 0
we call call_function
with oparg + 1 = 0 + 1 = 1
to signal that one positional argument exists on the stack ([1, 2, 3, 4
]).
In case B, where we explicitly pass the instance as a first argument, this is accounted for:
6 CALL_METHOD 1
so the call to call_function
can immediately pass oparg
as the value for the positional arguments.
What is the "unbound method" pushed onto the stack? Is it a "C function" (which one?) or a "Python object"?
It is a Python object that wraps around a C function. The Python object is a method descriptor and the C function it wraps is list_reverse
.
All built-in methods and function are implemented in C. During initialization, CPython initializes all builtins (see list
here) and adds wrappers around all the methods. These wrappers (objects) are descriptors that are used to implement Methods and Functions.
When a method is retrieved from a class via one of its instances, it is said to be bound to that instance. This can be seen by peeking at the __self__
attribute assigned to it:
m = [1, 2, 3, 4].reverse
m() # use __self__
print(m.__self__) # [4, 3, 2, 1]
This method can still be called even without the instance that qualifies it. It is bound to that instance. (NOTE: This is handled by the CALL_FUNCTION
opcode, not the LOAD/CALL_METHOD
ones).
An unbound method is one that is not yet bound to an instance. list.reverse
is unbound, it is waiting to be invoked through an instance to bind to it.
Something being unbound doesn't mean that it cannot be called, list.reverse
is called just fine if you explicitly pass the self
argument yourself as an argument. Remember that methods are just special functions that (among other things) implicitly pass self
as a first argument after being bound to an instance.
How can I tell that it's the
list_reverse
function in the listobject.c.h file?
This is easy, you can see the list's methods getting initialized in listobject.c
. LIST_REVERSE_METHODDEF
is simply a macro that, when substituted, adds the list_reverse
function to that list. The tp_methods
of a list are then wrapped inside function objects as stated previously.
Things might seem complicated here because CPython uses an internal tool, argument clinic, to automate argument handling. This kinda moves definitions around, obfuscating slightly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With