Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safer way to expose a C-allocated memory buffer using numpy/ctypes?

I'm writing Python bindings for a C library that uses shared memory buffers to store its internal state. The allocation and freeing of these buffers is done outside of Python by the library itself, but I can indirectly control when this happens by calling wrapped constructor/destructor functions from within Python. I'd like to expose some of the buffers to Python so that I can read from them, and in some cases push values to them. Performance and memory use are important concerns, so I would like to avoid copying data wherever possible.

My current approach is to create a numpy array that provides a direct view onto a ctypes pointer:

import numpy as np import ctypes as C  libc = C.CDLL('libc.so.6')  class MyWrapper(object):      def __init__(self, n=10):         # buffer allocated by external library         addr = libc.malloc(C.sizeof(C.c_int) * n)         self._cbuf = (C.c_int * n).from_address(addr)      def __del__(self):         # buffer freed by external library         libc.free(C.addressof(self._cbuf))         self._cbuf = None      @property     def buffer(self):         return np.ctypeslib.as_array(self._cbuf) 

As well as avoiding copies, this also means I can use numpy's indexing and assignment syntax and pass it directly to other numpy functions:

wrap = MyWrapper() buf = wrap.buffer       # buf is now a writeable view of a C-allocated buffer  buf[:] = np.arange(10)  # this is pretty cool! buf[::2] += 10  print(wrap.buffer) # [10  1 12  3 14  5 16  7 18  9] 

However, it's also inherently dangerous:

del wrap                # free the pointer  print(buf)              # this is bad! # [1852404336 1969367156  538978662  538976288  538976288  538976288 #  1752440867 1763734377 1633820787       8548]  # buf[0] = 99           # uncomment this line if you <3 segfaults 

To make this safer, I need to be able to check whether the underlying C pointer has been freed before I try to read/write to the array contents. I have a few thoughts on how to do this:

  • One way would be to generate a subclass of np.ndarray that holds a reference to the _cbuf attribute of MyWrapper, checks whether it is None before doing any reading/writing to its underlying memory, and raises an exception if this is the case.
  • I could easily generate multiple views onto the same buffer, e.g. by .view casting or slicing, so each of these would need to inherit the reference to _cbuf and the method that performs the check. I suspect that this could be achieved by overriding __array_finalize__, but I'm not sure exactly how.
  • The "pointer-checking" method would also need to be called before any operation that would read and/or write to the contents of the array. I don't know enough about numpy's internals to have an exhaustive list of methods to override.

How could I implement a subclass of np.ndarray that performs this check? Can anyone suggest a better approach?


Update: This class does most of what I want:

class SafeBufferView(np.ndarray):      def __new__(cls, get_buffer, shape=None, dtype=None):         obj = np.ctypeslib.as_array(get_buffer(), shape).view(cls)         if dtype is not None:             obj.dtype = dtype         obj._get_buffer = get_buffer         return obj      def __array_finalize__(self, obj):         if obj is None: return         self._get_buffer = getattr(obj, "_get_buffer", None)      def __array_prepare__(self, out_arr, context=None):         if not self._get_buffer(): raise Exception("Dangling pointer!")         return out_arr      # this seems very heavy-handed - surely there must be a better way?     def __getattribute__(self, name):         if name not in ["__new__", "__array_finalize__", "__array_prepare__",                         "__getattribute__", "_get_buffer"]:             if not self._get_buffer(): raise Exception("Dangling pointer!")         return super(np.ndarray, self).__getattribute__(name) 

For example:

wrap = MyWrapper() sb = SafeBufferView(lambda: wrap._cbuf) sb[:] = np.arange(10)  print(repr(sb)) # SafeBufferView([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)  print(repr(sb[::2])) # SafeBufferView([0, 2, 4, 6, 8], dtype=int32)  sbv = sb.view(np.double) print(repr(sbv)) # SafeBufferView([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313, #          1.48539705e-313,   1.90979621e-313])  # we have to call the destructor method of `wrap` explicitly - `del wrap` won't # do anything because `sb` and `sbv` both hold references to `wrap` wrap.__del__()  print(sb)                # Exception: Dangling pointer! print(sb + 1)            # Exception: Dangling pointer! print(sbv)               # Exception: Dangling pointer! print(np.sum(sb))        # Exception: Dangling pointer! print(sb.dot(sb))        # Exception: Dangling pointer!  print(np.dot(sb, sb))    # oops... # -70104698  print(np.extract(np.ones(10), sb)) # array([251019024,     32522, 498870232,     32522,         4,         5, #               6,         7,        48,         0], dtype=int32)  # np.copyto(sb, np.ones(10, np.int32))    # don't try this at home, kids! 

I'm sure there are other edge cases I've missed.


Update 2: I've had a play around with weakref.proxy, as suggested by @ivan_pozdeev. It's a nice idea, but unfortunately I can't see how it would work with numpy arrays. I could try to create a weakref to the numpy array returned by .buffer:

wrap = MyWrapper() wr = weakref.proxy(wrap.buffer) print(wr) # ReferenceError: weakly-referenced object no longer exists # <weakproxy at 0x7f6fe715efc8 to NoneType at 0x91a870> 

I think the problem here is that the np.ndarray instance returned by wrap.buffer immediately goes out of scope. A workaround would be for the class to instantiate the array on initialization, hold a strong reference to it, and have the .buffer() getter return a weakref.proxy to the array:

class MyWrapper2(object):      def __init__(self, n=10):         # buffer allocated by external library         addr = libc.malloc(C.sizeof(C.c_int) * n)         self._cbuf = (C.c_int * n).from_address(addr)         self._buffer = np.ctypeslib.as_array(self._cbuf)      def __del__(self):         # buffer freed by external library         libc.free(C.addressof(self._cbuf))         self._cbuf = None         self._buffer = None      @property     def buffer(self):         return weakref.proxy(self._buffer) 

However, this breaks if I create a second view onto the same array whilst the buffer is still allocated:

wrap2 = MyWrapper2() buf = wrap2.buffer buf[:] = np.arange(10)  buf2 = buf[:]   # create a second view onto the contents of buf  print(repr(buf)) # <weakproxy at 0x7fec3e709b50 to numpy.ndarray at 0x210ac80> print(repr(buf2)) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)  wrap2.__del__()  print(buf2[:])  # this is bad # [1291716568    32748 1291716568    32748        0        0        0 #         0       48        0]   print(buf[:])   # WTF?! # [34525664        0        0        0        0        0        0        0 #         0        0]   

This is seriously broken - after calling wrap2.__del__() not only can I read and write to buf2 which was a numpy array view onto wrap2._cbuf, but I can even read and write to buf, which should not be possible given that wrap2.__del__() sets wrap2._buffer to None.

like image 676
ali_m Avatar asked Jun 23 '16 10:06

ali_m


People also ask

Is NumPy memory efficient?

1. NumPy uses much less memory to store data. The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Can NumPy arrays efficiently store data?

NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.

What is a buffer NumPy?

The Numpy frombuffer() is one of the predefined function that is used to create the array using the buffer storage with specific areas; mainly, this buffer function is creating the arrays with a different set of parameters it returns the array version of the buffer the python interpreter of the numpy frombuffer() ...


1 Answers

You have to keep a reference to your Wrapper while any numpy array exists. Easiest way to achieve this, is to save this reference in a attribute of the ctype-buffer:

class MyWrapper(object):     def __init__(self, n=10):         # buffer allocated by external library         self.size = n         self.addr = libc.malloc(C.sizeof(C.c_int) * n)      def __del__(self):         # buffer freed by external library         libc.free(self.addr)      @property     def buffer(self):         buf = (C.c_int * self.size).from_address(self.addr)         buf._wrapper = self         return np.ctypeslib.as_array(buf) 

This way you're wrapper is automatically freed, when the last reference, e.g the last numpy array, is garbage collected.

like image 160
Daniel Avatar answered Sep 30 '22 10:09

Daniel