Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is __setitem__ much faster than an equivalent "normal" method for cdef-classes?

It looks like, for Cython's cdef-classes, using class special methods is sometimes faster than identical "usual" method, for example __setitem__ is 3 times faster than setitem:

%%cython
cdef class CyA:
    def __setitem__(self, index, val):
        pass
    def setitem(self, index, val):
        pass

and now:

cy_a=CyA()
%timeit cy_a[0]=3              # 32.4 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit cy_a.setitem(0,3)      # 97.5 ns ± 0.389 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

This neither the "normal" behavior for Python, for which the special functions are even somewhat slower (and obviosly slower than the Cython-equivalent):

class PyA:
    def __setitem__(self, index, val):
        pass
    def setitem(self, index, val):
        pass

py_a=PyA()
%timeit py_a[0]=3           # 198 ns ± 2.51 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit py_a.setitem(0,3)   # 123 ns ± 0.619 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

nor this is the case in Cython for all special functions:

%%cython
cdef class CyA:
    ...
    def __len__(self):
        return 1
    def len(self):
        return 1

which leads to:

cy_a=CyA()
%timeit len(cy_a)    #  59.6 ns ± 0.233 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit cy_a.len()   #  66.5 ns ± 0.326 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

i.e. almost identical running times.

Why is __setitem__(...) so much faster, than setitem(...) in a cdef-class, even if both are cythonized?

like image 697
ead Avatar asked Nov 28 '18 21:11

ead


People also ask

Do classes make code faster?

In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which. Always write code to be read, then if, and only if, it's not fast enough make it faster.

What object model is used by Python?

Python is an object-oriented (OO) programming language. Unlike some other object-oriented languages, Python doesn't force you to use the object-oriented paradigm exclusively: it also supports procedural programming with modules and functions, so you can select the best paradigm for each part of your program.


1 Answers

There's quite a bit of overhead for a generic Python method call - Python looks up the relevant attribute (a dictionary lookup), ensures that the attribute is a callable object, and once it's called handles the result. This overhead also applies to generic def functions for cdef classes (the only difference being is that the implementation of the method is defined in C).

However, special methods on C/Cython classes can be optimised, as follows:

Lookup speed

As a shortcut, PyTypeObject in the Python C API defines a number of different "slots" - direct function pointers for special methods. For __setitem__ there's actually two available: PyMappingMethods.mp_ass_subscript which corresponds to a generic "mapping" call, and PySequenceMethods.sq_ass_item, which lets you use an int as the indexer directly and corresponds to the C API function PySequence_SetItem.

For a cdef class, Cython only seems to generate the first (generic) one, so the speedup isn't from passing a C int directly. Cython does not fill these slots when generating a non-cdef class.

The advantage of these is that (for a C/Cython class) finding the __setitem__ function just involves a couple of pointer NULL checks followed by a C function call. This also applies to __len__ which is also defined by slots in PyTypeObject

In contrast,

  • for a Python class calling __setitem__, it instead uses a default implementation which does a dictionary lookup for the string "__setitem__".

  • For either a cdef or Python class calling a non-special def function, the attribute is looked up from the class/instance dictionary (which is slower)

Note that if the setitem regular function were to be defined in a cdef class as cpdef instead (and called from Cython) then Cython implements its own mechanism for a speedy lookup.

Calling efficiency

Having found the attribute it must be called. Where the special functions have been retrieved from PyTypeObject (e.g. __setitem__ and __len__ on a cdef class), they are simply C function pointers and so can be called directly.

For every other case the PyObject retrieved from attribute lookup must evaluated to see if it's a callable, then called.

Return handling

When __setitem__ is called from PyTypeObject as a special function the return value is an int, which is simply used as an error flag. No reference counting or handling of Python objects is needed.

When __len__ is called from a PyTypeObject as a special function, the return type is a Py_ssize_t, which must be converted to a Python object and then destroyed when no longer needed.

For normal functions (e.g. setitem called from a Python or Cython class, or __setitem__ defined in a Python class), the return value is a PyObject*, which must be reference counted/destroyed appropriately.


In summary, the difference is really to do with shortcuts in finding and calling the function rather than whether the contents of the function is Cythonized.

like image 193
DavidW Avatar answered Oct 04 '22 18:10

DavidW