I am trying to use cython
to wrap a C++ library (fastText
, if its relevant). The C++ library classes load a very large array from disk. My wrapper instantiates a class from the C++ library to load the array, then uses cython
memory views and numpy.asarray
to turn the array into a numpy
array, then calls torch.from_numpy
to create a tensor.
The problem arising is how to handle deallocating the memory for the array.
Right now, I get pointer being freed was not allocated
when the program exits. This is, I expect, because both the C++ code and numpy
/pytorch
are trying to manage the same chunk of RAM.
I could simply comment out the destructor in the C++ library, but that feels like its going to cause me a different problem down the road.
How should I approach the issue? Is there any kind of best practices documentation somewhere on how to handle memory sharing with C++ and cython
?
If I modify the C++ library to wrap the array in a shared_ptr
, will cython
(and numpy
, pytorch
, etc.) share the shared_ptr
properly?
I apologize if the question is naive; Python garbage collection is very mysterious to me.
Any advice is appreciated.
I can think of three sensible ways of doing it. I'll outline them below (i.e. none of the code will be complete but hopefully it will be clear how to finish it).
(This is looks to be the lines you're already thinking along).
Start by creating a Cython class that holds a shared pointer
from libcpp.memory cimport shared_ptr
cdef class Holder:
cdef shared_ptr[cpp_class] ptr
@staticmethod
cdef make_holder(shared_ptr[cpp_class] ptr):
cdef holder = Holder() # empty class
holder.ptr = ptr
return holder
You then need to define the buffer protocol for Holder
. This allows direct access to the memory allocated by cpp_class
in a way that both numpy arrays and Cython memoryviews can understand. Thus they hold a reference to a Holder
instance, which in turn keeps a cpp_class
alive. (Use np.asarray(holder_instance)
to create a numpy array that uses the instance's memory)
The buffer protocol is a little involved but Cython has fairly extensive documentation and you should largely be able to copy and paste their examples. The two methods you need to add to Holder
are __getbuffer__
and __releasebuffer__
.
In this version you allocate the memory as a numpy array (using the Python C API interface). When your C++ class is destructed in decrements the reference count of the array, however if Python holds references to that array then the array can outlive the C++ class.
#include <numpy/arrayobject.h>
#include <Python.h>
class cpp_class {
private:
PyObject* arr;
double* data;
public:
cpp_class() {
arr = PyArray_SimpleNew(...); // details left to be filled in
data = PyArray_DATA(reinterpret_cast<PyArrayObject*>(arr));
# fill in the data
}
~cpp_class() {
Py_DECREF(arr); // release our reference to it
}
PyObject* get_np_array() {
Py_INCREF(arr); // Cython expects this to be done before it receives a PyObject
return arr;
}
};
See the numpy documentation for details of the how to allocate numpy arrays from C/C++. Be careful of reference counting if you define copy/move constructors.
The Cython wrapper then looks like:
cdef extern from "some_header.hpp":
cdef cppclass cpp_class:
# whatever constructors you want to allow
object get_np_array()
In this scheme C++ allocates the array, but Cython/Python is responsible for deallocating it. Once ownership is transferred C++ no longer has access to the data.
class cpp_class {
public:
double* data; // for simplicity this is public - you may want to use accessors
cpp_class() :
data(new double[50])
{/* fill the array as needed */}
~cpp_class() {
delete [] data;
}
};
// helper function for Cython
inline void del_cpp_array(double* a) {
delete [] a;
}
You then use the cython.view.array
class to capture the allocated memory. This has a callback function which is used on destruction:
from cython cimport view
cdef extern from "some_header.hpp":
cdef cppclass cpp_class:
double* data
# whatever constructors and other functions
void del_cpp_array(double*)
# later
cdef cpp_class cpp_instance # create this however you like
# ...
# modify line below to match your data
arr = view.array(shape=(10, 2), itemsize=sizeof(double), format="d",
mode="C", allocate_buffer=False)
arr.data = <char*>cpp_instance.data
cpp_instance.data = None # reset to NULL pointer
arr.callback_free_data = del_cpp_array
arr
can then be used with a memoryview or a numpy array.
You may have to mess about a bit with casting from void*
or char*
with del_cpp_array
- I'm not sure exactly what types the Cython interface requires.
The first option is probably most work to implement but requires few changes to the C++ code. The second option may require changes to your C++ code that you don't want to make. The third option is simple but means that C++ no longer has access to the data, which might be a disadvantage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With