Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binding C array to Numpy array without copying

I am writing a Python class that will wrap a C module containing a C struct. I am using the Cython language (a super-set language of Python and C). The C struct is malloc'd in the constructor and contains an array that I want to use in Python. The array will be represented in Python as a NumPy array but I don't want to copy the values to it. I want to link the NumPy array directly to the malloc'd memory. For this task I use the NumPy Array API and specifically this function:

PyObject*PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum, void* data)

I managed to bind the NumPy array to the C struct's array using this code in Cython and it works well as long as the NumPy array and MultimediaParams object have the same lifetime:

cdef class MultimediaParams:
    def __init__(self, **kwargs):
        self._mm_np = < mm_np *> malloc(sizeof(mm_np))
        #some code...

    def as_ndarray(self): #TODO: what if self deallocated but numpy array still exists(segfault?)
        cdef numpy.npy_intp shape[1]
        cdef int arr_size = sizeof(self._mm_np[0].n2) / sizeof(self._mm_np[0].n2[0])
        shape[0] = < numpy.npy_intp > arr_size
        cdef numpy.ndarray ndarray
        ndarray = numpy.PyArray_SimpleNewFromData(1, shape, numpy.NPY_DOUBLE, self._mm_np[0].n2)

        return ndarray

    def __dealloc__(self):
        free(self._mm_np)

As you can see, the class has its __dealloc__ method which will take care of the memory allocated in C and free it when there are no references to MultimediaParams instance.

In this kind of binding NumPy is not owning the memory of the array.

The problem: when the MultimediaParams object is deallocated and the memory of the array is freed, the NumPy object is still pointing to memory that was just freed. This will cause a segfault when the NumPy object tries to access/modify the memory that was freed.

How can I make sure the MultimediaParams object is not deallocated as long as there is a NumPy object using its memory?

As I understand it, all I need to do is to make the NumPy object have a refference to a MultimediaParams instance from which it got the memory to point to. I tried to use ndarray.base = <PyObject*>self so NumPy will know its base object, this is supposed to add another reference to a MultimediaParams instance and will cause it not to be deallocated as long as the NumPy array is alive. This line causes my tests to fail because the contents of the NumPy array turn to garbage.

CLARIFICATION: The NumPy array does not take ownership of the C array memory and I don't want it to. I want MultimediaParams to be responsible for freeing the C struct (that contains the array data), but not to do it as long as the NumPy object is alive.

Any suggestions?

like image 648
Max Segal Avatar asked Nov 02 '15 12:11

Max Segal


People also ask

How do I pass an array from C to Python?

Define a new Python type (in your C code) to wrap and represent the array, with the same methods you'd define for a sequence object in Python ( __getitem__ , etc.). Cast the pointer to the array to intptr_t , or to explicit ctypes type, or just leave it un-cast; then use ctypes on the Python side to access it.

Does NP copy Asarray?

array(): Convert input data (list, tuple, array, or other sequence type) to an ndarray and copies the input data by default. np. asarray(): Convert input data to an ndarray but do not copy if the input is already an ndarray.

Is NumPy copy a Deepcopy?

NumPy Deep Copy With the copy. It means that any change in the original array will be reflected inside the copied array. On the other hand, a deep copy means copying each element of the original array into the copied array.

Does Cython speed up NumPy?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.


1 Answers

As @J.F.Sebastian's comment points towards, the problem is most likely that while you correctly assign a pointer to your MultimediaParams instance to the base reference of the NumPy array, you don't actually increase it's reference count, because the assignment is made in C, not in Python. This probably leads to premature garbage collection of the MultimediaParams object, the memory of which is reused and causes what you experience as garbage data in the ndarray.

Manually incrementing the reference count of the MultimediaParams object using the macro Py_INCREF should yield the desired behavior.

like image 157
Henrik Avatar answered Oct 09 '22 04:10

Henrik