Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pickle __getstate__ accept as a return value the very instance it required __getstate__ to pickle in the first place?

Tags:

python

pickle

I was going to ask "How to pickle a class that inherits from dict and defines __slots__". Then I realized the utterly mind-wrenching solution in class B below actually works...

import pickle

class A(dict):
    __slots__ = ["porridge"]
    def __init__(self, porridge): self.porridge = porridge

class B(A):
    __slots__ = ["porridge"]
    def __getstate__(self):
        # Returning the very item being pickled in 'self'??
        return self, self.porridge 
    def __setstate__(self, state):
        print "__setstate__(%s) type(%s, %s)" % (state, type(state[0]), 
                                                type(state[1]))
        self.update(state[0])
        self.porridge = state[1]

Here is some output:

>>> saved = pickle.dumps(A(10))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
>>> b = B('delicious')
>>> b['butter'] = 'yes please'
>>> loaded = pickle.loads(pickle.dumps(b))
__setstate__(({'butter': 'yes please'}, 'delicious')) type(<class '__main__.B'>, <type 'str'>)
>>> b
{'butter': 'yes please'}
>>> b.porridge
'delicious'

So basically, pickle cannot pickle a class that defines __slots__ without also defining __getstate__. Which is a problem if the class inherits from dict - because how do you return the content of the instance without returning self, which is the very instance pickle is already trying to pickle, and can't do so without calling __getstate__. Notice how __setstate__ is actually receiving an instance B as part of the state.

Well, it works... but can someone explain why? Is it a feature or a bug?

like image 980
porgarmingduod Avatar asked Mar 09 '11 14:03

porgarmingduod


People also ask

What is __ Getstate __ Python?

__getstate__ should return object (representing class state) which will be pickled and saved. __setstate__ should take object from parameter and use it to retrieve class state as it was before.

Which function returns the pickled representation of the object as a bytes object?

Python3. This function is used to read a pickled object representation from a bytes object and return the reconstituted object hierarchy specified.

What does pickle do in Python?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

Can you pickle a class instance Python?

Instances of any class can be pickled, as will be illustrated in a later example. By default, the pickle will be written in a binary format most compatible when sharing between Python 3 programs.


1 Answers

Maybe I'm a bit late to the party, but this question didn't get an answer that actually explains what's happening, so here we go.

Here's a quick summary for those who don't want to read this whole post (it got a bit long...):

  1. You don't need to take care of the contained dict instance in __getstate__() -- pickle will do this for you.

  2. If you include self in the state anyway, pickle's cycle detection will prevent an infinite loop.

Writing __getstate__() and __setstate__() methods for custom classes derived from dict

Let's start with the right way to write the __getstate__() and __setstate__() methods of your class. You don't need to take care of pickling the contents of the dict instance contained in B instances -- pickle knows how to deal with dictionaries and will do this for you. So this implementation will be enough:

class B(A):
    __slots__ = ["porridge"]
    def __getstate__(self):
        return self.porridge 
    def __setstate__(self, state):
        self.porridge = state

Example:

>>> a = B("oats")
>>> a[42] = "answer"
>>> b = pickle.loads(pickle.dumps(a))
>>> b
{42: 'answer'}
>>> b.porridge
'oats'

What's happening in your implementation?

Why does your implementation work as well, and what's happening under the hood? That's a bit more involved, but -- once we know that the dictionary gets pickled anyway -- not too hard to figure out. If the pickle module encounters an instance of a user-defined class, it calls the __reduce__() method of this class, which in turn calls __getstate__() (actually, it usually calls the __reduce_ex__() method, but that does not matter here). Let's define B again as you originally did, i.e. using the "recurisve" definition of __getstate__(), and let's see what we get when calling __reduce__() for an instance of B now:

>>> a = B("oats")
>>> a[42] = "answer"
>>> a.__reduce__()
(<function _reconstructor at 0xb7478454>,
 (<class '__main__.B'>, <type 'dict'>, {42: 'answer'}),
 ({42: 'answer'}, 'oats'))

As we can see from the documentation of __reduce__(), the method returns a tuple of 2 to 5 elements. The first element is a function that will be called to reconstruct the instance when unpickling, the second element is the tuple of arguments that will be passed to this function, and the third element is the return value of __getstate__(). We can already see that the dictionary information is included twice. The function _reconstructor() is an internal function of the copy_reg module that reconstructs the base class before __setstate__() is called when unpickling. (Have a look at the source code of this function if you like -- it's short!)

Now the pickler needs to pickle the return value of a.__reduce__(). It basically pickles the three elements of this tuple one after the other. The second element is a tuple again, and its items are also pickled one after the other. The third item of this inner tuple (i.e. a.__reduce__()[1][2]) is of type dict and is pickled using the internal pickler for dictionaries. The third element of the outer tuple (i.e. a.__reduce__()[2]) is also a tuple again, consisting of the B instance itself and a string. When pickling the B instance, the cycle detection of the pickle module kicks in: pickle realises this exact instance has already been dealt with, and only stores a reference to its id() instead of really pickling it -- this is why no infinte loop occurs.

When unpickling this mess again, the unpickler first reads the reconstruction function and its arguments from the stream. The function is called, resulting in an B instance with the dictionary part already initialised. Next, the unpickler reads the state. It encounters a tuple consisting of a reference to an already unpickled object -- namely our instance of B -- and a string, "oats". This tuple now is passed to B.__setstate__(). The first element of state and self are the same object now, as can be seen by adding the line

print self is state[0]

to your __setstate__() implementation (it prints True!). The line

self.update(state[0])

consequently simply updates the instance with itself.

like image 175
Sven Marnach Avatar answered Oct 13 '22 04:10

Sven Marnach