Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling C++ arrays in Cython (with numpy and pytorch)

I am trying to use cython to wrap a C++ library (fastText, if its relevant). The C++ library classes load a very large array from disk. My wrapper instantiates a class from the C++ library to load the array, then uses cython memory views and numpy.asarray to turn the array into a numpy array, then calls torch.from_numpy to create a tensor.

The problem arising is how to handle deallocating the memory for the array.

Right now, I get pointer being freed was not allocated when the program exits. This is, I expect, because both the C++ code and numpy/pytorch are trying to manage the same chunk of RAM.

I could simply comment out the destructor in the C++ library, but that feels like its going to cause me a different problem down the road.

How should I approach the issue? Is there any kind of best practices documentation somewhere on how to handle memory sharing with C++ and cython?

If I modify the C++ library to wrap the array in a shared_ptr, will cython (and numpy, pytorch, etc.) share the shared_ptr properly?

I apologize if the question is naive; Python garbage collection is very mysterious to me.

Any advice is appreciated.

like image 899
Bob Avatar asked Jan 03 '23 22:01

Bob


1 Answers

I can think of three sensible ways of doing it. I'll outline them below (i.e. none of the code will be complete but hopefully it will be clear how to finish it).

1. C++ owns the memory; Cython/Python holds a shared pointer to the C++ class

(This is looks to be the lines you're already thinking along).

Start by creating a Cython class that holds a shared pointer

from libcpp.memory cimport shared_ptr

cdef class Holder:
    cdef shared_ptr[cpp_class] ptr

    @staticmethod
    cdef make_holder(shared_ptr[cpp_class] ptr):
       cdef holder = Holder() # empty class
       holder.ptr = ptr
       return holder

You then need to define the buffer protocol for Holder. This allows direct access to the memory allocated by cpp_class in a way that both numpy arrays and Cython memoryviews can understand. Thus they hold a reference to a Holder instance, which in turn keeps a cpp_class alive. (Use np.asarray(holder_instance) to create a numpy array that uses the instance's memory)

The buffer protocol is a little involved but Cython has fairly extensive documentation and you should largely be able to copy and paste their examples. The two methods you need to add to Holder are __getbuffer__ and __releasebuffer__.

2. Python owns the memory; Your C++ class holds a pointer to the Python object

In this version you allocate the memory as a numpy array (using the Python C API interface). When your C++ class is destructed in decrements the reference count of the array, however if Python holds references to that array then the array can outlive the C++ class.

#include <numpy/arrayobject.h>
#include <Python.h>

class cpp_class {
   private:
     PyObject* arr;
     double* data;
   public:
     cpp_class() {
       arr = PyArray_SimpleNew(...); // details left to be filled in
       data = PyArray_DATA(reinterpret_cast<PyArrayObject*>(arr));
       # fill in the data
     }

     ~cpp_class() {
         Py_DECREF(arr); // release our reference to it
     }

     PyObject* get_np_array() {
         Py_INCREF(arr); // Cython expects this to be done before it receives a PyObject
         return arr;
     }
};

See the numpy documentation for details of the how to allocate numpy arrays from C/C++. Be careful of reference counting if you define copy/move constructors.

The Cython wrapper then looks like:

cdef extern from "some_header.hpp":
    cdef cppclass cpp_class:
       # whatever constructors you want to allow
       object get_np_array()

3. C++ transfers ownership of the data to Python/Cython

In this scheme C++ allocates the array, but Cython/Python is responsible for deallocating it. Once ownership is transferred C++ no longer has access to the data.

class cpp_class {
   public:
     double* data; // for simplicity this is public - you may want to use accessors
     cpp_class() :
     data(new double[50])
     {/* fill the array as needed */}

     ~cpp_class() {
       delete [] data;
     }
};

// helper function for Cython
inline void del_cpp_array(double* a) {
   delete [] a;
}

You then use the cython.view.array class to capture the allocated memory. This has a callback function which is used on destruction:

from cython cimport view

cdef extern from "some_header.hpp":
   cdef cppclass cpp_class:
      double* data
      # whatever constructors and other functions
   void del_cpp_array(double*)

# later
cdef cpp_class cpp_instance # create this however you like
# ...
# modify line below to match your data
arr = view.array(shape=(10, 2), itemsize=sizeof(double), format="d",
                 mode="C", allocate_buffer=False)
arr.data = <char*>cpp_instance.data
cpp_instance.data = None # reset to NULL pointer
arr.callback_free_data = del_cpp_array

arr can then be used with a memoryview or a numpy array.

You may have to mess about a bit with casting from void* or char* with del_cpp_array - I'm not sure exactly what types the Cython interface requires.


The first option is probably most work to implement but requires few changes to the C++ code. The second option may require changes to your C++ code that you don't want to make. The third option is simple but means that C++ no longer has access to the data, which might be a disadvantage.

like image 94
DavidW Avatar answered Jan 06 '23 11:01

DavidW