I am trying to write a __reduce__()
method for a cython class that contains C pointers but have so far found very little information on the best way to go about doing this. There are tons of examples around for how to properly write a __reduce__()
method when using numpy arrays as member data. I'd like to stay away from Numpy arrays as they seem to always be stored as python objects and require calls to and from the python API. I come from a C background so I am very comfortable working with memory manually using calls to malloc()
and free()
and am trying to keep python interaction to an absolute minimum.
However I have run into a problem. I have a need to use something equivalent to copy.deepcopy()
on the class I am creating, from the Python script where it will ultimately be used. I have found that the only good way to do this is to implement the pickle protocol for the class by implementing a __reduce__()
method. This is trivial with most primitives or python objects. However I am at an absolute loss for how to go about doing this for dynamically allocated C arrays. Obviously I can't return the pointer itself as the underlying memory will have disappeared by the time the object is reconstructed, so what's the best way to do this? I'm sure this will require modification of both the __reduce__()
method as well as one or both of the __init__()
methods.
I have read the python documentation on pickling extension types found here as well as just about every other question of stack overflow about picking cython classes such as this question.
A condensed version of my class looks something like this:
cdef class Bin:
cdef int* job_ids
cdef int* jobs
cdef int primitive_data
def __cinit__(self):
self.job_ids = <int*>malloc(40 * sizeof(int))
self.jobs = <int*>malloc(40 * sizeof(int))
def __init__(self, int val):
self.primitive_data = val
def __dealloc__(self):
free(job_ids)
free(jobs)
def __reduce__(self):
return (self.__class__, (self.primitive_data))
One approach is to serialise the data in your array into a Python bytes
array. The __reduce__
method first calls the get_data
method which casts the data pointer to <char*>
then to <bytes>
(if you try to go there directly Cython doesn't know how to do it). __reduce__
returns this object, along with a reference to the rebuild
function (a module-level function, not a method!) which can be use to recreate the instance using the set_data
method. If you need to pass more than one array, as in your example, you just need to accept more arguments to rebuild
and extend the tuple returned by __reduce__
.
I haven't done much testing on this but it seems to work. It would probably explode if you passed it malformed data.
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
from libc.string cimport memcpy
cdef int length = 40
cdef class MyClass:
cdef long *data
def __cinit__(self):
self.data = <long*>PyMem_Malloc(sizeof(long)*length)
if not self.data:
raise MemoryError()
cdef bytes get_data(self):
return <bytes>(<char *>self.data)[:sizeof(long)*length]
cdef void set_data(self, bytes data):
memcpy(self.data, <char*>data, sizeof(long)*length)
def set_values(self):
# assign some dummy data to the array 0..length
for n in range(0, length):
self.data[n] = n
def get(self, i):
# get the ith value of the data
return self.data[i]
def __reduce__(self):
data = self.get_data()
return (rebuild, (data,))
def __dealloc__(self):
PyMem_Free(self.data)
cpdef object rebuild(bytes data):
c = MyClass()
c.set_data(data)
return c
Example usage (assuming MyClass is in hello.pyx):
import hello
import pickle
c1 = hello.MyClass()
c1.set_values()
print('c1', c1)
print('fifth item', c1.get(5))
d = pickle.dumps(c1)
del(c1) # delete the original object
c2 = pickle.loads(d)
print('c2', c2)
print('fifth item', c2.get(5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With