Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle breaking change in Python 3.7

I have custom list and dictionary classes that no longer work while unpickling in Python 3.7.

import pickle

class A(dict):
    pass

class MyList(list): 

    def __init__(self, iterable=None, option=A):
        self.option=option
        if iterable:
            for x in iterable:
                self.append(x)

    def append(self, obj):
        if isinstance(obj, dict):
            obj = self.option(obj)
        super(MyList, self).append(obj)

    def extend(self, iterable): 
        for item in iterable:
            self.append(item)


if __name__ == '__main__':
    pickle_file = 'test_pickle'
    my_list = MyList([{'a': 1}])
    pickle.dump(my_list, open(pickle_file, 'wb'))
    loaded = pickle.load(open(pickle_file, 'rb'))
    print(isinstance(loaded[0], A))

Works fine on Python 2.6 through 3.6:

"C:\Program Files\Python36\python.exe" issue.py
True

But is no longer setting the self.option properly in 3.7.

"C:\Program Files\Python37\python.exe" issue.py

Traceback (most recent call last):
  File "issue.py", line 28, in <module>
    loaded = pickle.load(open(pickle_file, 'rb'))
  File "issue.py", line 21, in extend
    self.append(item)
  File "issue.py", line 16, in append
    obj = self.option(obj)
AttributeError: 'MyList' object has no attribute 'option'

If I were to remove the extend function, it works as expected though.

I have tried adding __setstate__ as well, but it is not called before extend so the option is still undefined at that point.

I do have to inherit directly from dict and list, and I do need to overwrite both the append and extend function in my code. Is there a way to set option beforehand or another fix? Is this change in behavior documented and the rational for it?

Thank you for your time

like image 523
CasualDemon Avatar asked Sep 14 '18 14:09

CasualDemon


1 Answers

Unpickling list objects switched from using list.append() to list.extend(), because that can be way faster for some list subclasses.

However, with that change, the way that the unpickling code tested for list objects also changed, from

if (PyList_Check(list)) {

to

if (PyList_CheckExact(list)) {

It is that change that affects your code. The above test looks for a fast path, saying if we have a list class, then use PyList_SetSlice() to load the data, rather than a slower path of explicitly calling either the .extend() or .append() method on the new instance. The old version (Python 3.6 and older) accepted lists and subclasses, the new version only accepts list itself, not subclasses!

So for Python 3.6 and older, when unpickling your custom MyList.append() method is not called, purely because you subclassed list. In Python 3.7, when unpickling your custom MyList.extend() method is called. This is very much intentional, subclasses should be allowed to provide a custom .extend() method that gets to be called when unpickling.

And the work-around is simple. Your data is already wrapped when unpickling, you don't need to re-apply that wrapper. When you do not have self.option set, simply skip applying it:

def append(self, obj):
    if isinstance(obj, dict):
        try:
            obj = self.option(obj)
        except AttributeError:
            # something's wrong, are we unpickling on Python 3.7 or newer?
            if 'option' in self.__dict__:
                # no, we are not, because 'option' has been set, this must
                # be an error in the option() call, so re-raise
                raise
            # yes, we are, just ignore this, obj is already wrapped
    super(MyList, self).append(obj)

This all does mean you can't rely on any instance attributes having been restored yet. If that's a big problem (you still need to consult instance state while unpickling), then you'll have to provide a different __reduce_ex__ method, one that doesn't return the data as an iterator in index 3 of the resulting tuple. list().__reduce_ex__() for protocol versions 2, 3 and 4 returns (copyreg.__newobj__, type(self), self.__dict__, iter(self), None).

A custom version would have to use (type(self), (tuple(self), self.option), None, None, None), for example. That does come with some additional overhead (that tuple(self) there will take additional memory when pickling and unpickling).

like image 86
Martijn Pieters Avatar answered Oct 05 '22 05:10

Martijn Pieters