Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I force a numpy ndarray to take ownership of its memory?

I have a C function that mallocs() and populates a 2D array of floats. It "returns" that address and the size of the array. The signature is

int get_array_c(float** addr, int* nrows, int* ncols);

I want to call it from Python, so I use ctypes.

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

I never figured out how to specify argument types with ctypes. I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. But its pretty big, so I want to use the memory allocated by the C function, not copy it. (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://stackoverflow.com/a/4355701/3691)

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

This seems to give me an array with the right values. But I'm pretty sure it's a memory leak.

>>> a = get_array_py()
>>> a.flags.owndata
False

The array doesn't own the memory. Fair enough; by default, when the array is created from a buffer, it shouldn't. But in this case it should. When the numpy array is deleted, I'd really like python to free the buffer memory for me. It seems like if I could force owndata to True, that should do it, but owndata isn't settable.

Unsatisfactory solutions:

  1. Make the caller of get_array_py() responsible for freeing the memory. That's super annoying; the caller should be able to treat this numpy array just like any other numpy array.

  2. Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). Return the copy instead of the original array. This is annoying because it's an ought-to-be unnecessary memory copy.

Is there a way to do what I want? I can't modify the C function itself, although I could add another C function to the library if that's helpful.

like image 411
Tom Future Avatar asked Jan 03 '12 06:01

Tom Future


People also ask

What is the difference between Ndarray and array in NumPy?

array is just a convenience function to create an ndarray ; it is not a class itself. You can also create an array using numpy. ndarray , but it is not the recommended way. From the docstring of numpy.

How are NumPy arrays stored in memory?

A NumPy array can be specified to be stored in row-major format, using the keyword argument order='C' , and the column-major format, using the keyword argument order='F' , when the array is created or reshaped. The default format is row-major.

Is NumPy array memory efficient?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

What is __ Array_interface __?

__array_interface__ A dictionary of items (3 required and 5 optional). The optional keys in the dictionary have implied defaults if they are not provided. The keys are: shape (required) Tuple whose elements are the array size in each dimension.


1 Answers

I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with:

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7, one had to be even more explicit:

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter.


As kynan commented below, if you use Cython, you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython.

The relevant documentation is here and here.

like image 143
Stefan Avatar answered Oct 24 '22 19:10

Stefan