I'm writing Python bindings for a C library that uses shared memory buffers to store its internal state. The allocation and freeing of these buffers is done outside of Python by the library itself, but I can indirectly control when this happens by calling wrapped constructor/destructor functions from within Python. I'd like to expose some of the buffers to Python so that I can read from them, and in some cases push values to them. Performance and memory use are important concerns, so I would like to avoid copying data wherever possible.
My current approach is to create a numpy array that provides a direct view onto a ctypes pointer:
import numpy as np import ctypes as C libc = C.CDLL('libc.so.6') class MyWrapper(object): def __init__(self, n=10): # buffer allocated by external library addr = libc.malloc(C.sizeof(C.c_int) * n) self._cbuf = (C.c_int * n).from_address(addr) def __del__(self): # buffer freed by external library libc.free(C.addressof(self._cbuf)) self._cbuf = None @property def buffer(self): return np.ctypeslib.as_array(self._cbuf)
As well as avoiding copies, this also means I can use numpy's indexing and assignment syntax and pass it directly to other numpy functions:
wrap = MyWrapper() buf = wrap.buffer # buf is now a writeable view of a C-allocated buffer buf[:] = np.arange(10) # this is pretty cool! buf[::2] += 10 print(wrap.buffer) # [10 1 12 3 14 5 16 7 18 9]
However, it's also inherently dangerous:
del wrap # free the pointer print(buf) # this is bad! # [1852404336 1969367156 538978662 538976288 538976288 538976288 # 1752440867 1763734377 1633820787 8548] # buf[0] = 99 # uncomment this line if you <3 segfaults
To make this safer, I need to be able to check whether the underlying C pointer has been freed before I try to read/write to the array contents. I have a few thoughts on how to do this:
np.ndarray
that holds a reference to the _cbuf
attribute of MyWrapper
, checks whether it is None
before doing any reading/writing to its underlying memory, and raises an exception if this is the case..view
casting or slicing, so each of these would need to inherit the reference to _cbuf
and the method that performs the check. I suspect that this could be achieved by overriding __array_finalize__
, but I'm not sure exactly how.How could I implement a subclass of np.ndarray
that performs this check? Can anyone suggest a better approach?
Update: This class does most of what I want:
class SafeBufferView(np.ndarray): def __new__(cls, get_buffer, shape=None, dtype=None): obj = np.ctypeslib.as_array(get_buffer(), shape).view(cls) if dtype is not None: obj.dtype = dtype obj._get_buffer = get_buffer return obj def __array_finalize__(self, obj): if obj is None: return self._get_buffer = getattr(obj, "_get_buffer", None) def __array_prepare__(self, out_arr, context=None): if not self._get_buffer(): raise Exception("Dangling pointer!") return out_arr # this seems very heavy-handed - surely there must be a better way? def __getattribute__(self, name): if name not in ["__new__", "__array_finalize__", "__array_prepare__", "__getattribute__", "_get_buffer"]: if not self._get_buffer(): raise Exception("Dangling pointer!") return super(np.ndarray, self).__getattribute__(name)
For example:
wrap = MyWrapper() sb = SafeBufferView(lambda: wrap._cbuf) sb[:] = np.arange(10) print(repr(sb)) # SafeBufferView([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32) print(repr(sb[::2])) # SafeBufferView([0, 2, 4, 6, 8], dtype=int32) sbv = sb.view(np.double) print(repr(sbv)) # SafeBufferView([ 2.12199579e-314, 6.36598737e-314, 1.06099790e-313, # 1.48539705e-313, 1.90979621e-313]) # we have to call the destructor method of `wrap` explicitly - `del wrap` won't # do anything because `sb` and `sbv` both hold references to `wrap` wrap.__del__() print(sb) # Exception: Dangling pointer! print(sb + 1) # Exception: Dangling pointer! print(sbv) # Exception: Dangling pointer! print(np.sum(sb)) # Exception: Dangling pointer! print(sb.dot(sb)) # Exception: Dangling pointer! print(np.dot(sb, sb)) # oops... # -70104698 print(np.extract(np.ones(10), sb)) # array([251019024, 32522, 498870232, 32522, 4, 5, # 6, 7, 48, 0], dtype=int32) # np.copyto(sb, np.ones(10, np.int32)) # don't try this at home, kids!
I'm sure there are other edge cases I've missed.
Update 2: I've had a play around with weakref.proxy
, as suggested by @ivan_pozdeev. It's a nice idea, but unfortunately I can't see how it would work with numpy arrays. I could try to create a weakref to the numpy array returned by .buffer
:
wrap = MyWrapper() wr = weakref.proxy(wrap.buffer) print(wr) # ReferenceError: weakly-referenced object no longer exists # <weakproxy at 0x7f6fe715efc8 to NoneType at 0x91a870>
I think the problem here is that the np.ndarray
instance returned by wrap.buffer
immediately goes out of scope. A workaround would be for the class to instantiate the array on initialization, hold a strong reference to it, and have the .buffer()
getter return a weakref.proxy
to the array:
class MyWrapper2(object): def __init__(self, n=10): # buffer allocated by external library addr = libc.malloc(C.sizeof(C.c_int) * n) self._cbuf = (C.c_int * n).from_address(addr) self._buffer = np.ctypeslib.as_array(self._cbuf) def __del__(self): # buffer freed by external library libc.free(C.addressof(self._cbuf)) self._cbuf = None self._buffer = None @property def buffer(self): return weakref.proxy(self._buffer)
However, this breaks if I create a second view onto the same array whilst the buffer is still allocated:
wrap2 = MyWrapper2() buf = wrap2.buffer buf[:] = np.arange(10) buf2 = buf[:] # create a second view onto the contents of buf print(repr(buf)) # <weakproxy at 0x7fec3e709b50 to numpy.ndarray at 0x210ac80> print(repr(buf2)) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32) wrap2.__del__() print(buf2[:]) # this is bad # [1291716568 32748 1291716568 32748 0 0 0 # 0 48 0] print(buf[:]) # WTF?! # [34525664 0 0 0 0 0 0 0 # 0 0]
This is seriously broken - after calling wrap2.__del__()
not only can I read and write to buf2
which was a numpy array view onto wrap2._cbuf
, but I can even read and write to buf
, which should not be possible given that wrap2.__del__()
sets wrap2._buffer
to None
.
1. NumPy uses much less memory to store data. The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.
The Numpy frombuffer() is one of the predefined function that is used to create the array using the buffer storage with specific areas; mainly, this buffer function is creating the arrays with a different set of parameters it returns the array version of the buffer the python interpreter of the numpy frombuffer() ...
You have to keep a reference to your Wrapper while any numpy array exists. Easiest way to achieve this, is to save this reference in a attribute of the ctype-buffer:
class MyWrapper(object): def __init__(self, n=10): # buffer allocated by external library self.size = n self.addr = libc.malloc(C.sizeof(C.c_int) * n) def __del__(self): # buffer freed by external library libc.free(self.addr) @property def buffer(self): buf = (C.c_int * self.size).from_address(self.addr) buf._wrapper = self return np.ctypeslib.as_array(buf)
This way you're wrapper is automatically freed, when the last reference, e.g the last numpy array, is garbage collected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With