Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to overwrite the dump/load methods in the pickle class - customizing pickling and unpickling - Python

Tags:

python

pickle

So far, what I've done is this:

import pickle

class MyPickler(pickle.Pickler):
    def __init__(self, file, protocol=None):
        super(MyPickler, self).__init__(file, protocol)

class MyUnpickler(pickle.Unpickler):
    def __init__(self, file):
        super(MyUnpickler, self).__init__(file) 

In my main method, this is mainly what I have

#created object, then... 
pickledObject = 'testing.pickle'
with open(pickledObject,'wb') as f:
    pickle = MyPickler(f)
    pickle.dump(object) #object is the object I want to pickle, created before this

with open(pickledObject, 'r') as pickledFile:
    unpickle = MyUnpickler(pickledFile)
    object2 = unpickle.load()

However, this is giving me the following error when the super method is called: TypeError: must be type, not classobj

How does one overwrite only the two methods, load and dump? The pickle file is under C:\Python27/lib/pickle.py

EDIT The enum.py file can be found here: http://dpaste.com/780897/

Object details: Object is initialized like this:

object = CellSizeRelation(CellSizeRelation.Values.FIRST)

And CellSizeRelation is a class that uses the Enumeration:

class CellSizeRelation(Option):
    Values = enum.Enum('FIRST',
                       'SECOND')

Before I pickle object, I do this:

print object.Values._values 
print object.value.enumtype 

output

[EnumValue(<enum.Enum object at 0x02E80E50>, 0, 'FIRST'), EnumValue(<enum.Enum object at 0x02E80E50>, 1, 'SECOND')
<enum.Enum object at 0x02E80E50>

After I unpickle and print out the same thing, I get this output:

[EnumValue(<enum.Enum object at 0x02E80E50>, 0, 'FIRST'), EnumValue(<enum.Enum object at 0x02E80E50>, 1, 'SECOND')
<enum.Enum object at 0x02ECF750>

The problem is that the second object address changes; When initialized the first time, the enumtype and _values have the same address. However, after unpickling, they change addresses. This breaks my code when I try to compare two enumValues. If you look in the enumValue class, the compare function tries to do this:

try:
        assert self.enumtype == other.enumtype
        result = cmp(self.index, other.index)

Because the address changes, the assert function fails. I now somehow need to ensure that the address for the enumtype does not change when unpickled. I was thinking of simply getting the value 'FIRST' from the unpickled file, finding out its index, and reinitializing the object with:

def load:
    object = CellSizeRelation(CellSizeRelation.Values[INDEX])
    return object
like image 327
SaiyanGirl Avatar asked Aug 03 '12 18:08

SaiyanGirl


1 Answers

You want to customize the way object state is pickled and unpickled, not customize the load and unload functionality.

You'll have to study the Pickling and unpickling normal class instances chapter, in your case defining a __getstate__ and __setstate__ method should be enough.

What happens in your case is that there is a class-level attribute with EnumValue instances, which are meant to be constants. But on unpickling, new EnumValue instances are created that are not connected to the class-level attribute anymore.

The EnumValue instances do have an index attribute you can use to capture their state as an integer instead of an instance of EnumValue, which we can use to find the correct constant again when reinstating your instances:

 class CellSizeRelation(Option):
     # skipping your enum definition and __init__ here

     def __getstate__(self):
         # capture what is normally pickled
         state = self.__dict__.copy()
         # replace the `value` key (now an EnumValue instance), with it's index:
         state['value'] = state['value'].index
         # what we return here will be stored in the pickle
         return state

     def __setstate__(self, newstate):
         # re-create the EnumState instance based on the stored index
         newstate['value'] = self.Values[newstate['value']]
         # re-instate our __dict__ state from the pickled state
         self.__dict__.update(newstate)

So, normally, if there is no __getstate__ the instance __dict__ is pickled. We now do return a copy of that __dict__, but we swapped out the EnumValue instance for it's index (a simple integer). On unpickling, normally the new instance __dict__ is updated with the unpickled __dict__ we captured on pickling, but now that we have a __setstate__ defined, we can swap out the enum index back out for the correct EnumValue again.

like image 69
Martijn Pieters Avatar answered Sep 18 '22 11:09

Martijn Pieters