Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the relative advantages of extending NumPy in Cython vs Boost.Python?

I need to speed up some algorithms working on NumPy arrays. They will use std::vector and some of the more advanced STL data structures.

I've narrowed my choices down to Cython (which now wraps most STL containers) and Boost.Python (which now has built-in support for NumPy).

I know from my experience as a programmer that sometimes it takes months of working with a framework to uncover its hidden issues (because they are rarely used as talking points by its disciples), so your help could potentially save me a lot of time.

What are the relative advantages and disadvantages of extending NumPy in Cython vs Boost.Python?

like image 941
MWB Avatar asked Jan 23 '17 19:01

MWB


2 Answers

This is a very incomplete answer that only really covers a couple of small parts of it (I'll edit it if I think of anything more):


Boost doesn't look to implement operator[] specifically for numpy arrays. This means that operator[] will come from the base object class (that ndarray inherits), which will mean the call will go through the Python mechanisms to __getitem__ and so indexing will be slow (close to Python speed). If you want to do indexing at speed you'll have to do pointer arithmetic yourself:

// rough gist - untested:

// i,j,k are your indices

double* data = reinterpret_cast<double*>(array.get_data());
// in reality you'd check the dtype - the data may not be a double...

double data_element = array.strides(0)*i + array.strides(1)*j +array.strides(2)*k;

In contrast Cython has efficient indexing of numpy arrays built in automatically.


Cython isn't great at things like std::vector (although it isn't absolutely terrible - you can usually trick it into doing what you want). One notable limitation is that all cdefs have to go at the start of the function so C++ classes with be default constructed there, and then assigned to/manipulated later (which can be somewhat inefficient). For anything beyond simple uses you do not want to be manipulating C++ types in Cython (instead it's better to write the code in C++ then call it from Cython).

A second limitation is that it struggles with non-class templates. One common example is std::array, which is templated with a number. Depending on your planned code this may or may not be an issue.

like image 186
DavidW Avatar answered Sep 23 '22 07:09

DavidW


For small one shot problems, I tend to prefer cython, for larger integration with c++ code bases, prefer boost Python.

In part, it depends on the audience for your code. If you're working with a team with significant experience in python, but little experience of using C++, Cython makes sense. If you have a fixed code base with complex types to inter operate with, the boost python can end up being a little cheaper to get running.

Cython encourages you to write incrementally, gradually adding types as required to get extra performance and solves many of the hard packaging problems. boost Python requires a substantial effort in getting a build setup, and it can be hard to produce packages that make sense on PyPI

Cython has good built in error messages/diagnostics, but from what I've seen, the errors that come out of boost can be very hard to interpret - be kind to yourself and use a new-ish c++ compiler, preferably one known for producing readable error messages.

Don't discount alternative tools like numba (similar performance to cython with code that is Python, not just something that looks similar) and pybind11 (boost Python without boost and with better error messages)

like image 42
Andrew Walker Avatar answered Sep 19 '22 07:09

Andrew Walker