In the Cython docs there is an example where they give two ways of writing a C/Python hybrid method. An explicit one with a cdef for fast C access and a wrapper def for access from Python:
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cdef int _area(self):
cdef int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def area(self):
return self._area()
And one using cpdef:
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cpdef int area(self):
cdef int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
I was wondering what the differences are in practical terms.
For example, is either method faster/slower when called from C/Python?
Also, when subclassing/overriding does cpdef offer anything that the other method lacks?
chrisb's answer gives you all you need to know, but if you are game for gory details...
But first, the takeaways from the lengthy analysis bellow in a nutshell:
For free functions, there is not much difference between cpdef
and rolling it out with cdef
+def
performance-wise. The resulting c-code is almost identical.
For bound methods, cpdef
-approach can be slightly faster in the presence of inheritance-hierarchies, but nothing to get too excited about.
Using cpdef
-syntax has its advantages, as the resulting code is clearer (at least to me) and shorter.
Free functions:
When we define something silly like:
cpdef do_nothing_cp():
pass
the following happens:
__pyx_f_3foo_do_nothing_cp
because my extension is called foo
, but you actually have only to look for the f
prefix).__pyx_pf_3foo_2do_nothing_cp
- prefix pf
), it does not duplicate the code and call the fast function somewhere on the way.__pyx_pw_3foo_3do_nothing_cp
(prefix pw
)do_nothing_cp
method definition is issued, this is what the python-wrapper is needed for, and this is the place where is stored which function should be called when foo.do_nothing_cp
is invoked.You can see it in the produced c-code here:
static PyMethodDef __pyx_methods[] = {
{"do_nothing_cp", (PyCFunction)__pyx_pw_3foo_3do_nothing_cp, METH_NOARGS, 0},
{0, 0, 0, 0}
};
For a cdef
function, only the first step happens, for a def
-function only steps 2-4.
Now when we load module foo
and invoke foo.do_nothing_cp()
the following happens:
do_nothing_cp
is found, in our case the python-wrapper pw
-function.pw
-function is called via function-pointer, and calls the pf
-function (as C-functionality)pf
-function calls the fast f
-function.What happens if we call do_nothing_cp
inside the cython-module?
def call_do_nothing_cp():
do_nothing_cp()
Clearly, cython doesn't need the python machinery to locate the function in this case - it can directly use the fast f
-function via a c-function call, bypassing pw
and pf
functions.
What happens if we wrap cdef
function in a def
-function?
cdef _do_nothing():
pass
def do_nothing():
_do_nothing()
Cython does the following:
_do_nothing
-function is created, corresponding to the f
- function above.pf
-function for do_nothing
is created, which calls _do_nothing
somewhere on the way.pw
function is created which wraps the pf
-functionfoo.do_nothing
via function-pointer to the python-wrapper pw
-function.As you can see - not much difference to the cpdef
-approach.
The cdef
-functions are just simple c-function, but def
and cpdef
function are python-function of the first class - you could do something like this:
foo.do_nothing=foo.do_nothing_cp
As to performance, we cannot expect much difference here:
>>> import foo
>>> %timeit foo.do_nothing_cp
51.6 ns ± 0.437 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit foo.do_nothing
51.8 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If we look at the resulting machine code (objdump -d foo.so
), we can see that the C-compiler has inlined all calls for the cpdef-version do_nothing_cp
:
0000000000001340 <__pyx_pw_3foo_3do_nothing_cp>:
1340: 48 8b 05 91 1c 20 00 mov 0x201c91(%rip),%rax
1347: 48 83 00 01 addq $0x1,(%rax)
134b: c3 retq
134c: 0f 1f 40 00 nopl 0x0(%rax)
but not for the rolled out do_nothing
(I must confess, I'm a little bit surprised and don't understand the reasons yet):
0000000000001380 <__pyx_pw_3foo_1do_nothing>:
1380: 53 push %rbx
1381: 48 8b 1d 50 1c 20 00 mov 0x201c50(%rip),%rbx # 202fd8 <_DYNAMIC+0x208>
1388: 48 8b 13 mov (%rbx),%rdx
138b: 48 85 d2 test %rdx,%rdx
138e: 75 0d jne 139d <__pyx_pw_3foo_1do_nothing+0x1d>
1390: 48 8b 43 08 mov 0x8(%rbx),%rax
1394: 48 89 df mov %rbx,%rdi
1397: ff 50 30 callq *0x30(%rax)
139a: 48 8b 13 mov (%rbx),%rdx
139d: 48 83 c2 01 add $0x1,%rdx
13a1: 48 89 d8 mov %rbx,%rax
13a4: 48 89 13 mov %rdx,(%rbx)
13a7: 5b pop %rbx
13a8: c3 retq
13a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
This could explain, why cpdef
version is slightly faster, but anyway the difference is nothing compared to the overhead of a python-function-call.
Class-methods:
The situation is a little bit more complicated for class methods, because of the possible polymorphism. Let's start out with:
cdef class A:
cpdef do_nothing_cp(self):
pass
At first sight, there is not that much difference to the case above:
f
-prefix-version of the function is emittedpf
) version is emitted, which calls the f
-functionpw
) wraps the pf
-version and is used for registration.do_nothing_cp
is registered as a method of class A
via tp_methods
-pointer of the PyTypeObject
.As can be seen in the produced c-file:
static PyMethodDef __pyx_methods_3foo_A[] = {
{"do_nothing", (PyCFunction)__pyx_pw_3foo_1A_1do_nothing_cp, METH_NOARGS, 0},
...
{0, 0, 0, 0}
};
....
static PyTypeObject __pyx_type_3foo_A = {
...
__pyx_methods_3foo_A, /*tp_methods*/
...
};
Clearly, the bound version has to have the implicit parameter self
as an additional argument - but there is more to it: The f
-function performs a function-dispatch if called not from the corresponding pf
function, this dispatch looks as follows (I keep only the important parts):
static PyObject *__pyx_f_3foo_1A_do_nothing_cp(CYTHON_UNUSED struct __pyx_obj_3foo_A *__pyx_v_self, int __pyx_skip_dispatch) {
if (unlikely(__pyx_skip_dispatch)) ;//__pyx_skip_dispatch=1 if called from pf-version
/* Check if overridden in Python */
else if (look-up if function is overriden in __dict__ of the object)
use the overriden function
}
do the work.
Why is it needed? Consider the following extension foo
:
cdef class A:
cpdef do_nothing_cp(self):
pass
cdef class B(A):
cpdef call_do_nothing(self):
self.do_nothing()
What happens when we call B().call_do_nothing()
?
B-pf-call_do_nothing
,B-f-call_do_nothing
, A-f-do_nothing_cp
, bypassing pw
and pf
-versions.What happens when we add the following class C
, which overrides the do_nothing_cp
-function?
import foo
def class C(foo.B):
def do_nothing_cp(self):
print("I do something!")
Now calling C().call_do_nothing()
leads to:
call_do_nothing' of the
C-class being located and called which means,
pw-call_do_nothing' of the B
-class being located and called,B-pf-call_do_nothing
,B-f-call_do_nothing
, A-f-do_nothing
(as we already know!), bypassing pw
and pf
-versions.And now in the 4. step, we need to dispatch the call in A-f-do_nothing()
in order to get the right C.do_nothing()
call! Luckily we have this dispatch in the function at hand!
To make it more complicated: what if the class C
were also a cdef
-class? The dispatch via __dict__
would not work, because cdef-classes don't have __dict__
?
For the cdef-classes, the polymorphism is implemented similar to C++'s "virtual tables", so in B.call_do_nothing()
the f-do_nothing
-function is not called directly but via a pointer, which depends on the class of the object (one can see those "virtual tables" being set up in __pyx_pymod_exec_XXX
, e.g. __pyx_vtable_3foo_B.__pyx_base
). Thus the __dict__
-dispatch in A-f-do_nothing()
-function is not needed in case of pure cdef-hierarchy.
As to performance, comparing cpdef
with cdef
+def
I get:
cpdef def+cdef
A.do_nothing 107ns 108ns
B.call_nothing 109ns 116ns
so the difference isn't that large with, if someone, cpdef
being slightly faster.
See docs here - for most purposes they are practically the same, cpdef has slightly more overhead but plays nicer with inheritance.
The directive cpdef makes two versions of the method available; one fast for use from Cython and one slower for use from Python. Then:
This does slightly more than providing a python wrapper for a cdef method: unlike a cdef method, a cpdef method is fully overridable by methods and instance attributes in Python subclasses. It adds a little calling overhead compared to a cdef method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With