Preserve custom attributes when pickling subclass of numpy array

Tags:

I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.

I'm manipulating instances of this class within a parallel loop, using Python multiprocessing. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle.

The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__ method, which is being called upon pickling.

Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:

import numpy as np  class RealisticInfoArray(np.ndarray):      def __new__(cls, input_array, info=None):         # Input array is an already formed ndarray instance         # We first cast to be our class type         obj = np.asarray(input_array).view(cls)         # add the new attribute to the created instance         obj.info = info         # Finally, we must return the newly created object:         return obj      def __array_finalize__(self, obj):         # see InfoArray.__array_finalize__ for comments         if obj is None: return         self.info = getattr(obj, 'info', None)

Now here is the problem:

import pickle  obj = RealisticInfoArray([1, 2, 3], info='foo') print obj.info  # 'foo'  pickle_str = pickle.dumps(obj) new_obj = pickle.loads(pickle_str) print new_obj.info  #  raises AttributeError

Thanks.

578

asked Oct 27 '14 22:10

Gabriel

2 Answers

np.ndarray uses __reduce__ to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:

>>> obj = RealisticInfoArray([1, 2, 3], info='foo') >>> obj.__reduce__() (<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))

So, we get a 3-tuple back. The docs for __reduce__ describe what each element is doing:

When a tuple is returned, it must be between two and five elements long. Optional elements can either be omitted, or None can be provided as their value. The contents of this tuple are pickled as normal and used to reconstruct the object at unpickling time. The semantics of each element are:

A callable object that will be called to create the initial version of the object. The next element of the tuple will provide arguments for this callable, and later elements provide additional state information that will subsequently be used to fully reconstruct the pickled data.

In the unpickling environment this object must be either a class, a callable registered as a “safe constructor” (see below), or it must have an attribute __safe_for_unpickling__ with a true value. Otherwise, an UnpicklingError will be raised in the unpickling environment. Note that as usual, the callable itself is pickled by name.

A tuple of arguments for the callable object.

Optionally, the object’s state, which will be passed to the object’s __setstate__() method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__() method, then, as above, the value must be a dictionary and it will be added to the object’s __dict__.

So, _reconstruct is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b') are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) gets passed to the class' __setstate__. This gives us an opportunity; we could override __reduce__ and provide our own tuple to __setstate__, and then additionally override __setstate__, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__, too:

class RealisticInfoArray(np.ndarray):     def __new__(cls, input_array, info=None):         obj = np.asarray(input_array).view(cls)         obj.info = info         return obj      def __array_finalize__(self, obj):         if obj is None: return         self.info = getattr(obj, 'info', None)      def __reduce__(self):         # Get the parent's __reduce__ tuple         pickled_state = super(RealisticInfoArray, self).__reduce__()         # Create our own tuple to pass to __setstate__         new_state = pickled_state[2] + (self.info,)         # Return a tuple that replaces the parent's __setstate__ tuple with our own         return (pickled_state[0], pickled_state[1], new_state)      def __setstate__(self, state):         self.info = state[-1]  # Set the info attribute         # Call the parent's __setstate__ with the other tuple elements.         super(RealisticInfoArray, self).__setstate__(state[0:-1])

Usage:

>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo') >>> pickle_str = pickle.dumps(obj) >>> pickle_str "cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb." >>> new_obj = pickle.loads(pickle_str) >>> new_obj.info 'foo'

180

answered Oct 20 '22 00:10

dano

I'm the dill (and pathos) author. dill was pickling a numpy.array before numpy could do it itself. @dano's explanation is pretty accurate. Me personally, I'd just use dill and let it do the job for you. With dill, you don't need __reduce__, as dill has several ways that it grabs subclassed attributes… one of which is storing the __dict__ for any class object. pickle doesn't do this, b/c it usually works with classes by name reference and not storing the class object itself… so you have to work with __reduce__ to make pickle work for you. No need, in most cases, with dill.

>>> import numpy as np >>>  >>> class RealisticInfoArray(np.ndarray): ...     def __new__(cls, input_array, info=None): ...         # Input array is an already formed ndarray instance ...         # We first cast to be our class type ...         obj = np.asarray(input_array).view(cls) ...         # add the new attribute to the created instance ...         obj.info = info ...         # Finally, we must return the newly created object: ...         return obj ...     def __array_finalize__(self, obj): ...         # see InfoArray.__array_finalize__ for comments ...         if obj is None: return ...         self.info = getattr(obj, 'info', None) ...  >>> import dill as pickle >>> obj = RealisticInfoArray([1, 2, 3], info='foo') >>> print obj.info  # 'foo' foo >>>  >>> pickle_str = pickle.dumps(obj) >>> new_obj = pickle.loads(pickle_str) >>> print new_obj.info foo

dill can extend itself into pickle (essentially by copy_reg everything it knows), so you can then use all dill types in anything that uses pickle. Now, if you are going to use multiprocessing, you are a bit screwed, since it uses cPickle. There is, however, the pathos fork of multiprocessing (called pathos.multiprocessing), which basically the only change is it uses dill instead of cPickle… and thus can serialize a heck of a lot more in a Pool.map. I think (currently) if you want to work with your subclass of a numpy.array in multiprocessing (or pathos.multiprocessing), you might have to do something like @dano suggests -- but not sure, as I didn't think of a good case off the top of my head to test your subclass.

If you are interested, get pathos here: https://github.com/uqfoundation

answered Oct 19 '22 23:10

Mike McKerns

Related questions
                            
                                Get first element in array and return using Aggregate?
                            
                                Can you see who is watching a repo on GitHub?
                            
                                How to export coefficients of the regression analysis fto a spreadsheet or csv file?
                            
                                Percent sign % not working in crontab
                            
                                How to install nodejs application in Godaddy server [closed]
                            
                                Equivalent of Java's anonymous class in C#?
                            
                                Haskell performance implementing unix's "cat" program with Data.ByteString
                            
                                Scroll multiple splits simultaneously in IntelliJ IDEA
                            
                                Pickle is refusing to serialize content with celery reporting ContentDisallowed: Refusing to deserialize untrusted content of type pickle
                            
                                Launch a completely independent process
                            
                                How to test an API endpoint with Django-rest-framework using Django-oauth-toolkit for authentication
                            
                                How can customize Asp.net Identity 2 username already taken validation message?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With