C++ Model
Say I have the following C++ data structures I wish to expose to Python.
#include <memory>
#include <vector>
struct mystruct
{
int a, b, c, d, e, f, g, h, i, j, k, l, m;
};
typedef std::vector<std::shared_ptr<mystruct>> mystruct_list;
Boost Python
I can wrap these fairly effectively using boost::python with the following code, easily allowing me to use the existing mystruct (copying the shared_ptr) rather than recreating an existing object.
#include "mystruct.h"
#include <boost/python.hpp>
using namespace boost::python;
BOOST_PYTHON_MODULE(example)
{
class_<mystruct, std::shared_ptr<mystruct>>("MyStruct", init<>())
.def_readwrite("a", &mystruct::a);
// add the rest of the member variables
class_<mystruct_list>("MyStructList", init<>())
.def("at", &mystruct_list::at, return_value_policy<copy_const_reference>());
// add the rest of the member functions
}
Cython
In Cython, I have no idea how to extract an item from mystruct_list, without copying the underlying data. I have no idea how I could initialize MyStruct
from the existing shared_ptr<mystruct>
, without copying all the data over in one of various forms.
from libcpp.memory cimport shared_ptr
from cython.operator cimport dereference
cdef extern from "mystruct.h" nogil:
cdef cppclass mystruct:
int a, b, c, d, e, f, g, h, i, j, k, l, m
ctypedef vector[v] mystruct_list
cdef class MyStruct:
cdef shared_ptr[mystruct] ptr
def __cinit__(MyStruct self):
self.ptr.reset(new mystruct)
property a:
def __get__(MyStruct self):
return dereference(self.ptr).a
def __set__(MyStruct self, int value):
dereference(self.ptr).a = value
cdef class MyStructList:
cdef mystruct_list c
cdef mystruct_list.iterator it
def __cinit__(MyStructList self):
pass
def __getitem__(MyStructList self, int index):
# How do return MyStruct without copying the underlying `mystruct`
pass
I see many possible workarounds, and none of them are very satisfactory:
I could initialize an empty MyStruct
, and in Cython assign over the shared_ptr. However, this would result in wasting an initalized struct for absolutely no reason.
MyStruct value
value.ptr = self.c.at(index)
return value
I also could copy the data from the existing mystruct
to the new mystruct
. However, this suffers from similar bloat.
MyStruct value
dereference(value.ptr).a = dereference(self.c.at(index)).a
return value
I could also expose a init=True
flag for each __cinit__
method, which would prevent reconstructing the object internally if the C-object exists already (when init is False). However, this could cause catastrophic issues, since it would be exposed to the Python API and would allow dereferencing a null or uninitialized pointer.
def __cinit__(MyStruct self, bint init=True):
if init:
self.ptr.reset(new mystruct)
I could also overload __init__
with the Python-exposed constructor (which would reset self.ptr
), but this would have risky memory safety if __new__
was used from the Python layer.
Bottom-Line
I would love to use Cython, for compilation speed, syntactical sugar, and numerous other reasons, as opposed to the fairly clunky boost::python. I'm looking at pybind11 right now, and it may solve the compilation speed issues, but I would still prefer to use Cython.
Is there any way I can do such a simple task idiomatically in Cython? Thanks.
The way this works in Cython is by having a factory class to create Python objects out of the shared pointer. This gives you access to the underlying C/C++ structure without copying.
Example Cython code:
<..>
cdef class MyStruct:
cdef shared_ptr[mystruct] ptr
def __cinit__(self):
# Do not create new ref here, we will
# pass one in from Cython code
self.ptr = NULL
def __dealloc__(self):
# Do de-allocation here, important!
if self.ptr is not NULL:
<de-alloc>
<rest per MyStruct code above>
cdef object PyStruct(shared_ptr[mystruct] MyStruct_ptr):
"""Python object factory class taking Cpp mystruct pointer
as argument
"""
# Create new MyStruct object. This does not create
# new structure but does allocate a null pointer
cdef MyStruct _mystruct = MyStruct()
# Set pointer of cdef class to existing struct ptr
_mystruct.ptr = MyStruct_ptr
# Return the wrapped MyStruct object with MyStruct_ptr
return _mystruct
def make_structure():
"""Function to create new Cpp mystruct and return
python object representation of it
"""
cdef MyStruct mypystruct = PyStruct(new mystruct)
return mypystruct
Note the type for the argument of PyStruct
is a pointer to the Cpp struct.
mypystruct
then is a python object of class MyStruct
, as returned by the factory class, which refers to the
Cpp mystruct without copying. mypystruct
can be safely returned in def
cython functions and used in python space, per make_structure
code.
To return a Python object of an existing Cpp mystruct
pointer just wrap it with PyStruct
like
return PyStruct(my_cpp_struct_ptr)
anywhere in your Cython code.
Obviously only def
functions are visible there so the Cpp function calls would need to be wrapped as well inside MyStruct if they are to be used in Python space, at least if you want the Cpp function calls inside the Cython class to let go of the GiL (probably worth doing for obvious reasons).
For a real-world example see this Cython extension code and the underlying C code bindings in Cython. Also see this code for Python function wrapping of C function calls that let go of GIL. Not Cpp but same applies.
See also official Cython documentation on when a factory class/function is needed (Note that all constructor arguments will be passed as Python objects
). For built in types, Cython does this conversion for you but for custom structures or objects a factory class/function is needed.
The Cpp structure initialisation could be handled in __new__
of PyStruct
if needed, per suggestion above, if you want the factory class to actually create the C++ structure for you (depends on the use case really).
The benefit of a factory class with pointer arguments is it allows you to use existing pointers of C/C++ structures and wrap them in a Python extension class, rather than always having to create new ones. It would be perfectly safe to, for example, have multiple Python objects referring to the same underlying C struct. Python's ref counting ensures they won't be de-allocated prematurely. You should still check for null when deallocating though as the shared pointer could already had been de-allocated explicitly (eg, by del
).
Note that there is, however, some overhead in creating new python objects even if they do point to the same C++ structure. Not a lot, but still.
IMO this auto de-allocation and ref counting of C/C++ pointers is one of the greatest features of Python's C extension API. As all that acts on Python objects (alone), the C/C++ structures need to be wrapped in a compatible Python object
class definition.
Note - My experience is mostly in C, the above may need adjusting as I'm more familiar with regular C pointers than C++'s shared pointers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With