I was going to ask "How to pickle a class that inherits from dict
and defines __slots__
". Then I realized the utterly mind-wrenching solution in class B
below actually works...
import pickle
class A(dict):
__slots__ = ["porridge"]
def __init__(self, porridge): self.porridge = porridge
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
# Returning the very item being pickled in 'self'??
return self, self.porridge
def __setstate__(self, state):
print "__setstate__(%s) type(%s, %s)" % (state, type(state[0]),
type(state[1]))
self.update(state[0])
self.porridge = state[1]
Here is some output:
>>> saved = pickle.dumps(A(10))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
>>> b = B('delicious')
>>> b['butter'] = 'yes please'
>>> loaded = pickle.loads(pickle.dumps(b))
__setstate__(({'butter': 'yes please'}, 'delicious')) type(<class '__main__.B'>, <type 'str'>)
>>> b
{'butter': 'yes please'}
>>> b.porridge
'delicious'
So basically, pickle
cannot pickle a class that defines __slots__
without also defining __getstate__
. Which is a problem if the class inherits from dict
- because how do you return the content of the instance without returning self
, which is the very instance pickle is already trying to pickle, and can't do so without calling __getstate__
. Notice how __setstate__
is actually receiving an instance B
as part of the state.
Well, it works... but can someone explain why? Is it a feature or a bug?
__getstate__ should return object (representing class state) which will be pickled and saved. __setstate__ should take object from parameter and use it to retrieve class state as it was before.
Python3. This function is used to read a pickled object representation from a bytes object and return the reconstituted object hierarchy specified.
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.
Instances of any class can be pickled, as will be illustrated in a later example. By default, the pickle will be written in a binary format most compatible when sharing between Python 3 programs.
Maybe I'm a bit late to the party, but this question didn't get an answer that actually explains what's happening, so here we go.
Here's a quick summary for those who don't want to read this whole post (it got a bit long...):
You don't need to take care of the contained dict
instance in __getstate__()
-- pickle
will do this for you.
If you include self
in the state anyway, pickle
's cycle detection will prevent an infinite loop.
__getstate__()
and __setstate__()
methods for custom classes derived from dict
Let's start with the right way to write the __getstate__()
and __setstate__()
methods of your class. You don't need to take care of pickling the contents of the dict
instance contained in B
instances -- pickle
knows how to deal with dictionaries and will do this for you. So this implementation will be enough:
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
return self.porridge
def __setstate__(self, state):
self.porridge = state
Example:
>>> a = B("oats")
>>> a[42] = "answer"
>>> b = pickle.loads(pickle.dumps(a))
>>> b
{42: 'answer'}
>>> b.porridge
'oats'
Why does your implementation work as well, and what's happening under the hood? That's a bit more involved, but -- once we know that the dictionary gets pickled anyway -- not too hard to figure out. If the pickle
module encounters an instance of a user-defined class, it calls the __reduce__()
method of this class, which in turn calls __getstate__()
(actually, it usually calls the __reduce_ex__()
method, but that does not matter here). Let's define B
again as you originally did, i.e. using the "recurisve" definition of __getstate__()
, and let's see what we get when calling __reduce__()
for an instance of B
now:
>>> a = B("oats")
>>> a[42] = "answer"
>>> a.__reduce__()
(<function _reconstructor at 0xb7478454>,
(<class '__main__.B'>, <type 'dict'>, {42: 'answer'}),
({42: 'answer'}, 'oats'))
As we can see from the documentation of __reduce__()
, the method returns a tuple of 2 to 5 elements. The first element is a function that will be called to reconstruct the instance when unpickling, the second element is the tuple of arguments that will be passed to this function, and the third element is the return value of __getstate__()
. We can already see that the dictionary information is included twice. The function _reconstructor()
is an internal function of the copy_reg
module that reconstructs the base class before __setstate__()
is called when unpickling. (Have a look at the source code of this function if you like -- it's short!)
Now the pickler needs to pickle the return value of a.__reduce__()
. It basically pickles the three elements of this tuple one after the other. The second element is a tuple again, and its items are also pickled one after the other. The third item of this inner tuple (i.e. a.__reduce__()[1][2]
) is of type dict
and is pickled using the internal pickler for dictionaries. The third element of the outer tuple (i.e. a.__reduce__()[2]
) is also a tuple again, consisting of the B
instance itself and a string. When pickling the B
instance, the cycle detection of the pickle
module kicks in: pickle
realises this exact instance has already been dealt with, and only stores a reference to its id()
instead of really pickling it -- this is why no infinte loop occurs.
When unpickling this mess again, the unpickler first reads the reconstruction function and its arguments from the stream. The function is called, resulting in an B
instance with the dictionary part already initialised. Next, the unpickler reads the state. It encounters a tuple consisting of a reference to an already unpickled object -- namely our instance of B
-- and a string, "oats"
. This tuple now is passed to B.__setstate__()
. The first element of state
and self
are the same object now, as can be seen by adding the line
print self is state[0]
to your __setstate__()
implementation (it prints True
!). The line
self.update(state[0])
consequently simply updates the instance with itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With