Safer way to expose a C-allocated memory buffer using numpy/ctypes?

Tags:

I'm writing Python bindings for a C library that uses shared memory buffers to store its internal state. The allocation and freeing of these buffers is done outside of Python by the library itself, but I can indirectly control when this happens by calling wrapped constructor/destructor functions from within Python. I'd like to expose some of the buffers to Python so that I can read from them, and in some cases push values to them. Performance and memory use are important concerns, so I would like to avoid copying data wherever possible.

My current approach is to create a numpy array that provides a direct view onto a ctypes pointer:

import numpy as np import ctypes as C  libc = C.CDLL('libc.so.6')  class MyWrapper(object):      def __init__(self, n=10):         # buffer allocated by external library         addr = libc.malloc(C.sizeof(C.c_int) * n)         self._cbuf = (C.c_int * n).from_address(addr)      def __del__(self):         # buffer freed by external library         libc.free(C.addressof(self._cbuf))         self._cbuf = None      @property     def buffer(self):         return np.ctypeslib.as_array(self._cbuf)

As well as avoiding copies, this also means I can use numpy's indexing and assignment syntax and pass it directly to other numpy functions:

wrap = MyWrapper() buf = wrap.buffer       # buf is now a writeable view of a C-allocated buffer  buf[:] = np.arange(10)  # this is pretty cool! buf[::2] += 10  print(wrap.buffer) # [10  1 12  3 14  5 16  7 18  9]

However, it's also inherently dangerous:

del wrap                # free the pointer  print(buf)              # this is bad! # [1852404336 1969367156  538978662  538976288  538976288  538976288 #  1752440867 1763734377 1633820787       8548]  # buf[0] = 99           # uncomment this line if you <3 segfaults

To make this safer, I need to be able to check whether the underlying C pointer has been freed before I try to read/write to the array contents. I have a few thoughts on how to do this:

One way would be to generate a subclass of np.ndarray that holds a reference to the _cbuf attribute of MyWrapper, checks whether it is None before doing any reading/writing to its underlying memory, and raises an exception if this is the case.
I could easily generate multiple views onto the same buffer, e.g. by .view casting or slicing, so each of these would need to inherit the reference to _cbuf and the method that performs the check. I suspect that this could be achieved by overriding __array_finalize__, but I'm not sure exactly how.
The "pointer-checking" method would also need to be called before any operation that would read and/or write to the contents of the array. I don't know enough about numpy's internals to have an exhaustive list of methods to override.

How could I implement a subclass of np.ndarray that performs this check? Can anyone suggest a better approach?

Update: This class does most of what I want:

class SafeBufferView(np.ndarray):      def __new__(cls, get_buffer, shape=None, dtype=None):         obj = np.ctypeslib.as_array(get_buffer(), shape).view(cls)         if dtype is not None:             obj.dtype = dtype         obj._get_buffer = get_buffer         return obj      def __array_finalize__(self, obj):         if obj is None: return         self._get_buffer = getattr(obj, "_get_buffer", None)      def __array_prepare__(self, out_arr, context=None):         if not self._get_buffer(): raise Exception("Dangling pointer!")         return out_arr      # this seems very heavy-handed - surely there must be a better way?     def __getattribute__(self, name):         if name not in ["__new__", "__array_finalize__", "__array_prepare__",                         "__getattribute__", "_get_buffer"]:             if not self._get_buffer(): raise Exception("Dangling pointer!")         return super(np.ndarray, self).__getattribute__(name)

For example:

wrap = MyWrapper() sb = SafeBufferView(lambda: wrap._cbuf) sb[:] = np.arange(10)  print(repr(sb)) # SafeBufferView([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)  print(repr(sb[::2])) # SafeBufferView([0, 2, 4, 6, 8], dtype=int32)  sbv = sb.view(np.double) print(repr(sbv)) # SafeBufferView([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313, #          1.48539705e-313,   1.90979621e-313])  # we have to call the destructor method of `wrap` explicitly - `del wrap` won't # do anything because `sb` and `sbv` both hold references to `wrap` wrap.__del__()  print(sb)                # Exception: Dangling pointer! print(sb + 1)            # Exception: Dangling pointer! print(sbv)               # Exception: Dangling pointer! print(np.sum(sb))        # Exception: Dangling pointer! print(sb.dot(sb))        # Exception: Dangling pointer!  print(np.dot(sb, sb))    # oops... # -70104698  print(np.extract(np.ones(10), sb)) # array([251019024,     32522, 498870232,     32522,         4,         5, #               6,         7,        48,         0], dtype=int32)  # np.copyto(sb, np.ones(10, np.int32))    # don't try this at home, kids!

I'm sure there are other edge cases I've missed.

Update 2: I've had a play around with weakref.proxy, as suggested by @ivan_pozdeev. It's a nice idea, but unfortunately I can't see how it would work with numpy arrays. I could try to create a weakref to the numpy array returned by .buffer:

wrap = MyWrapper() wr = weakref.proxy(wrap.buffer) print(wr) # ReferenceError: weakly-referenced object no longer exists # <weakproxy at 0x7f6fe715efc8 to NoneType at 0x91a870>

I think the problem here is that the np.ndarray instance returned by wrap.buffer immediately goes out of scope. A workaround would be for the class to instantiate the array on initialization, hold a strong reference to it, and have the .buffer() getter return a weakref.proxy to the array:

class MyWrapper2(object):      def __init__(self, n=10):         # buffer allocated by external library         addr = libc.malloc(C.sizeof(C.c_int) * n)         self._cbuf = (C.c_int * n).from_address(addr)         self._buffer = np.ctypeslib.as_array(self._cbuf)      def __del__(self):         # buffer freed by external library         libc.free(C.addressof(self._cbuf))         self._cbuf = None         self._buffer = None      @property     def buffer(self):         return weakref.proxy(self._buffer)

However, this breaks if I create a second view onto the same array whilst the buffer is still allocated:

wrap2 = MyWrapper2() buf = wrap2.buffer buf[:] = np.arange(10)  buf2 = buf[:]   # create a second view onto the contents of buf  print(repr(buf)) # <weakproxy at 0x7fec3e709b50 to numpy.ndarray at 0x210ac80> print(repr(buf2)) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)  wrap2.__del__()  print(buf2[:])  # this is bad # [1291716568    32748 1291716568    32748        0        0        0 #         0       48        0]   print(buf[:])   # WTF?! # [34525664        0        0        0        0        0        0        0 #         0        0]

This is seriously broken - after calling wrap2.__del__() not only can I read and write to buf2 which was a numpy array view onto wrap2._cbuf, but I can even read and write to buf, which should not be possible given that wrap2.__del__() sets wrap2._buffer to None.

676

asked Jun 23 '16 10:06

ali_m

1 Answers

You have to keep a reference to your Wrapper while any numpy array exists. Easiest way to achieve this, is to save this reference in a attribute of the ctype-buffer:

class MyWrapper(object):     def __init__(self, n=10):         # buffer allocated by external library         self.size = n         self.addr = libc.malloc(C.sizeof(C.c_int) * n)      def __del__(self):         # buffer freed by external library         libc.free(self.addr)      @property     def buffer(self):         buf = (C.c_int * self.size).from_address(self.addr)         buf._wrapper = self         return np.ctypeslib.as_array(buf)

This way you're wrapper is automatically freed, when the last reference, e.g the last numpy array, is garbage collected.

160

answered Sep 30 '22 10:09

Daniel

Related questions
                            
                                MacOSX Instruments to profile Python code
                            
                                Comparing SQLAlchemy Object Instances for Equality of Attributes
                            
                                Python ASCII Graph Drawing [closed]
                            
                                how to call a program from python without waiting for it to return
                            
                                Performance degradation of matrix multiplication of single vs double precision arrays on multi-core machine
                            
                                PyAudio IOError: No Default Input Device Available
                            
                                CPU Flame Graphs for Python
                            
                                easy_install : ImportError: Entry point ('console_scripts', 'easy_install') not found
                            
                                Flask slow at retrieving post data from request?
                            
                                Accessing validation data within a custom callback
                            
                                change strength of antialiasing in matplotlib
                            
                                Python equivalent of which() in R
                            
                                Python / ImportError: Import by filename is not supported [duplicate]
                            
                                How to merge two dictionaries with same key names [duplicate]
                            
                                What does the group_keys argument to pandas.groupby actually do?
                            
                                setting up environment in virtaulenv using python3 stuck on setuptools, pip, wheel
                            
                                PIP installation for Python3 problem: Consider adding this directory to PATH
                            
                                Need help understanding Comet in Python (with Django)
                            
                                How do I delete or replace a file in a zip archive?
                            
                                Pandas groupby apply performing slow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Safer way to expose a C-allocated memory buffer using numpy/ctypes?

Tags:

python

c

dangling-pointer

numpy

ctypes

ali_m

People also ask

1 Answers

Daniel

Recent Activity

Donate For Us