It looks like, for Cython's cdef-classes, using class special methods is sometimes faster than identical "usual" method, for example __setitem__
is 3 times faster than setitem
:
%%cython
cdef class CyA:
def __setitem__(self, index, val):
pass
def setitem(self, index, val):
pass
and now:
cy_a=CyA()
%timeit cy_a[0]=3 # 32.4 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit cy_a.setitem(0,3) # 97.5 ns ± 0.389 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
This neither the "normal" behavior for Python, for which the special functions are even somewhat slower (and obviosly slower than the Cython-equivalent):
class PyA:
def __setitem__(self, index, val):
pass
def setitem(self, index, val):
pass
py_a=PyA()
%timeit py_a[0]=3 # 198 ns ± 2.51 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit py_a.setitem(0,3) # 123 ns ± 0.619 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
nor this is the case in Cython for all special functions:
%%cython
cdef class CyA:
...
def __len__(self):
return 1
def len(self):
return 1
which leads to:
cy_a=CyA()
%timeit len(cy_a) # 59.6 ns ± 0.233 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit cy_a.len() # 66.5 ns ± 0.326 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
i.e. almost identical running times.
Why is __setitem__(...)
so much faster, than setitem(...)
in a cdef-class, even if both are cythonized?
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which. Always write code to be read, then if, and only if, it's not fast enough make it faster.
Python is an object-oriented (OO) programming language. Unlike some other object-oriented languages, Python doesn't force you to use the object-oriented paradigm exclusively: it also supports procedural programming with modules and functions, so you can select the best paradigm for each part of your program.
There's quite a bit of overhead for a generic Python method call - Python looks up the relevant attribute (a dictionary lookup), ensures that the attribute is a callable object, and once it's called handles the result. This overhead also applies to generic def
functions for cdef
classes (the only difference being is that the implementation of the method is defined in C).
However, special methods on C/Cython classes can be optimised, as follows:
As a shortcut,
PyTypeObject
in the Python C API defines a number of different "slots" - direct function pointers for special methods. For __setitem__
there's actually two available: PyMappingMethods.mp_ass_subscript
which corresponds to a generic "mapping" call, and PySequenceMethods.sq_ass_item
, which lets you use an int as the indexer directly and corresponds to the C API function PySequence_SetItem
.
For a cdef class
, Cython only seems to generate the first (generic) one, so the speedup isn't from passing a C int
directly. Cython does not fill these slots when generating a non-cdef
class.
The advantage of these is that (for a C/Cython class) finding the __setitem__
function just involves a couple of pointer NULL checks followed by a C function call. This also applies to __len__
which is also defined by slots in PyTypeObject
In contrast,
for a Python class calling __setitem__
, it instead uses a default implementation which does a dictionary lookup for the string "__setitem__"
.
For either a cdef
or Python class calling a non-special def
function, the attribute is looked up from the class/instance dictionary (which is slower)
Note that if the setitem
regular function were to be defined in a cdef class
as cpdef
instead (and called from Cython) then Cython implements its own mechanism for a speedy lookup.
Having found the attribute it must be called. Where the special functions have been retrieved from PyTypeObject
(e.g. __setitem__
and __len__
on a cdef class
), they are simply C function pointers and so can be called directly.
For every other case the PyObject
retrieved from attribute lookup must evaluated to see if it's a callable, then called.
When __setitem__
is called from PyTypeObject
as a special function the return value is an int, which is simply used as an error flag. No reference counting or handling of Python objects is needed.
When __len__
is called from a PyTypeObject
as a special function, the return type is a Py_ssize_t
, which must be converted to a Python object and then destroyed when no longer needed.
For normal functions (e.g. setitem
called from a Python or Cython class, or __setitem__
defined in a Python class), the return value is a PyObject*
, which must be reference counted/destroyed appropriately.
In summary, the difference is really to do with shortcuts in finding and calling the function rather than whether the contents of the function is Cythonized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With