Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle Cython Class with C pointers

I am trying to write a __reduce__() method for a cython class that contains C pointers but have so far found very little information on the best way to go about doing this. There are tons of examples around for how to properly write a __reduce__() method when using numpy arrays as member data. I'd like to stay away from Numpy arrays as they seem to always be stored as python objects and require calls to and from the python API. I come from a C background so I am very comfortable working with memory manually using calls to malloc() and free() and am trying to keep python interaction to an absolute minimum.

However I have run into a problem. I have a need to use something equivalent to copy.deepcopy() on the class I am creating, from the Python script where it will ultimately be used. I have found that the only good way to do this is to implement the pickle protocol for the class by implementing a __reduce__() method. This is trivial with most primitives or python objects. However I am at an absolute loss for how to go about doing this for dynamically allocated C arrays. Obviously I can't return the pointer itself as the underlying memory will have disappeared by the time the object is reconstructed, so what's the best way to do this? I'm sure this will require modification of both the __reduce__() method as well as one or both of the __init__() methods.

I have read the python documentation on pickling extension types found here as well as just about every other question of stack overflow about picking cython classes such as this question.

A condensed version of my class looks something like this:

cdef class Bin:
    cdef int* job_ids
    cdef int* jobs
    cdef int primitive_data

    def __cinit__(self):
        self.job_ids = <int*>malloc(40 * sizeof(int))
        self.jobs = <int*>malloc(40 * sizeof(int))

    def __init__(self, int val):
        self.primitive_data = val

    def __dealloc__(self):
        free(job_ids)
        free(jobs)

    def __reduce__(self):
        return (self.__class__, (self.primitive_data))
like image 972
MS-DDOS Avatar asked Mar 30 '16 06:03

MS-DDOS


1 Answers

One approach is to serialise the data in your array into a Python bytes array. The __reduce__ method first calls the get_data method which casts the data pointer to <char*> then to <bytes> (if you try to go there directly Cython doesn't know how to do it). __reduce__ returns this object, along with a reference to the rebuild function (a module-level function, not a method!) which can be use to recreate the instance using the set_data method. If you need to pass more than one array, as in your example, you just need to accept more arguments to rebuild and extend the tuple returned by __reduce__.

I haven't done much testing on this but it seems to work. It would probably explode if you passed it malformed data.

from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
from libc.string cimport memcpy

cdef int length = 40

cdef class MyClass:
    cdef long *data

    def __cinit__(self):
        self.data = <long*>PyMem_Malloc(sizeof(long)*length)
        if not self.data:
            raise MemoryError()

    cdef bytes get_data(self):
        return <bytes>(<char *>self.data)[:sizeof(long)*length]

    cdef void set_data(self, bytes data):
        memcpy(self.data, <char*>data, sizeof(long)*length)

    def set_values(self):
        # assign some dummy data to the array 0..length
        for n in range(0, length):
            self.data[n] = n

    def get(self, i):
        # get the ith value of the data
        return self.data[i]

    def __reduce__(self):
        data = self.get_data()
        return (rebuild, (data,))

    def __dealloc__(self):
        PyMem_Free(self.data)

cpdef object rebuild(bytes data):
    c = MyClass()
    c.set_data(data)
    return c

Example usage (assuming MyClass is in hello.pyx):

import hello
import pickle

c1 = hello.MyClass()
c1.set_values()
print('c1', c1)
print('fifth item', c1.get(5))

d = pickle.dumps(c1)
del(c1)  # delete the original object

c2 = pickle.loads(d)
print('c2', c2)
print('fifth item', c2.get(5))
like image 54
Snorfalorpagus Avatar answered Oct 21 '22 03:10

Snorfalorpagus