Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initializing Cython objects with existing C Objects

C++ Model

Say I have the following C++ data structures I wish to expose to Python.

#include <memory>
#include <vector>

struct mystruct
{
    int a, b, c, d, e, f, g, h, i, j, k, l, m;
};

typedef std::vector<std::shared_ptr<mystruct>> mystruct_list;

Boost Python

I can wrap these fairly effectively using boost::python with the following code, easily allowing me to use the existing mystruct (copying the shared_ptr) rather than recreating an existing object.

#include "mystruct.h"
#include <boost/python.hpp>

using namespace boost::python;


BOOST_PYTHON_MODULE(example)
{
    class_<mystruct, std::shared_ptr<mystruct>>("MyStruct", init<>())
        .def_readwrite("a", &mystruct::a);
        // add the rest of the member variables

    class_<mystruct_list>("MyStructList", init<>())
        .def("at", &mystruct_list::at, return_value_policy<copy_const_reference>());
        // add the rest of the member functions
}

Cython

In Cython, I have no idea how to extract an item from mystruct_list, without copying the underlying data. I have no idea how I could initialize MyStruct from the existing shared_ptr<mystruct>, without copying all the data over in one of various forms.

from libcpp.memory cimport shared_ptr
from cython.operator cimport dereference


cdef extern from "mystruct.h" nogil:
    cdef cppclass mystruct:
        int a, b, c, d, e, f, g, h, i, j, k, l, m

    ctypedef vector[v] mystruct_list


cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(MyStruct self):
        self.ptr.reset(new mystruct)

    property a:
        def __get__(MyStruct self):
            return dereference(self.ptr).a

        def __set__(MyStruct self, int value):
            dereference(self.ptr).a = value


cdef class MyStructList:
    cdef mystruct_list c
    cdef mystruct_list.iterator it

    def __cinit__(MyStructList self):
        pass

    def __getitem__(MyStructList self, int index):
        # How do return MyStruct without copying the underlying `mystruct` 
        pass

I see many possible workarounds, and none of them are very satisfactory:

I could initialize an empty MyStruct, and in Cython assign over the shared_ptr. However, this would result in wasting an initalized struct for absolutely no reason.

MyStruct value
value.ptr = self.c.at(index)
return value

I also could copy the data from the existing mystruct to the new mystruct. However, this suffers from similar bloat.

MyStruct value
dereference(value.ptr).a = dereference(self.c.at(index)).a
return value

I could also expose a init=True flag for each __cinit__ method, which would prevent reconstructing the object internally if the C-object exists already (when init is False). However, this could cause catastrophic issues, since it would be exposed to the Python API and would allow dereferencing a null or uninitialized pointer.

def __cinit__(MyStruct self, bint init=True):
    if init:
        self.ptr.reset(new mystruct)

I could also overload __init__ with the Python-exposed constructor (which would reset self.ptr), but this would have risky memory safety if __new__ was used from the Python layer.

Bottom-Line

I would love to use Cython, for compilation speed, syntactical sugar, and numerous other reasons, as opposed to the fairly clunky boost::python. I'm looking at pybind11 right now, and it may solve the compilation speed issues, but I would still prefer to use Cython.

Is there any way I can do such a simple task idiomatically in Cython? Thanks.

like image 259
Alexander Huszagh Avatar asked Jun 21 '17 21:06

Alexander Huszagh


1 Answers

The way this works in Cython is by having a factory class to create Python objects out of the shared pointer. This gives you access to the underlying C/C++ structure without copying.

Example Cython code:

<..>

cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(self):
        # Do not create new ref here, we will
        # pass one in from Cython code
        self.ptr = NULL

    def __dealloc__(self):
        # Do de-allocation here, important!
        if self.ptr is not NULL:
            <de-alloc>

    <rest per MyStruct code above>

cdef object PyStruct(shared_ptr[mystruct] MyStruct_ptr):
    """Python object factory class taking Cpp mystruct pointer
    as argument
    """
    # Create new MyStruct object. This does not create
    # new structure but does allocate a null pointer
    cdef MyStruct _mystruct = MyStruct()
    # Set pointer of cdef class to existing struct ptr
    _mystruct.ptr = MyStruct_ptr
    # Return the wrapped MyStruct object with MyStruct_ptr
    return _mystruct

def make_structure():
    """Function to create new Cpp mystruct and return
    python object representation of it
    """
    cdef MyStruct mypystruct = PyStruct(new mystruct)
    return mypystruct

Note the type for the argument of PyStruct is a pointer to the Cpp struct.

mypystruct then is a python object of class MyStruct, as returned by the factory class, which refers to the Cpp mystruct without copying. mypystruct can be safely returned in def cython functions and used in python space, per make_structure code.

To return a Python object of an existing Cpp mystruct pointer just wrap it with PyStruct like

return PyStruct(my_cpp_struct_ptr)

anywhere in your Cython code.

Obviously only def functions are visible there so the Cpp function calls would need to be wrapped as well inside MyStruct if they are to be used in Python space, at least if you want the Cpp function calls inside the Cython class to let go of the GiL (probably worth doing for obvious reasons).

For a real-world example see this Cython extension code and the underlying C code bindings in Cython. Also see this code for Python function wrapping of C function calls that let go of GIL. Not Cpp but same applies.

See also official Cython documentation on when a factory class/function is needed (Note that all constructor arguments will be passed as Python objects). For built in types, Cython does this conversion for you but for custom structures or objects a factory class/function is needed.

The Cpp structure initialisation could be handled in __new__ of PyStruct if needed, per suggestion above, if you want the factory class to actually create the C++ structure for you (depends on the use case really).

The benefit of a factory class with pointer arguments is it allows you to use existing pointers of C/C++ structures and wrap them in a Python extension class, rather than always having to create new ones. It would be perfectly safe to, for example, have multiple Python objects referring to the same underlying C struct. Python's ref counting ensures they won't be de-allocated prematurely. You should still check for null when deallocating though as the shared pointer could already had been de-allocated explicitly (eg, by del).

Note that there is, however, some overhead in creating new python objects even if they do point to the same C++ structure. Not a lot, but still.

IMO this auto de-allocation and ref counting of C/C++ pointers is one of the greatest features of Python's C extension API. As all that acts on Python objects (alone), the C/C++ structures need to be wrapped in a compatible Python object class definition.

Note - My experience is mostly in C, the above may need adjusting as I'm more familiar with regular C pointers than C++'s shared pointers.

like image 119
danny Avatar answered Nov 18 '22 02:11

danny