Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle a dict subclass without __reduce__ method does not load member attributes

I have the need to ensure that a dict can only accept a certain type of objects as values. It also have to be pickable. Here is my first attempt:

import pickle

class TypedDict(dict):
    _dict_type = None

    def __init__(self, dict_type, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._dict_type = dict_type

    def __setitem__(self, key, value):
        if not isinstance(value, self._dict_type):
            raise TypeError('Wrong type')
        super().__setitem__(key, value)

If I test it with the following code (python 3.5)

my_dict = TypedDict(int)
my_dict['foo'] = 98

with open('out.pkl', 'wb') as fin:
    pickle.dump(my_dict, fin)

with open('out.pkl', 'rb') as fin:
    out = pickle.load(fin)

I get the error: TypeError: isinstance() arg 2 must be a type or tuple of types.
It seems that it is not loading the correct value for _dict_type and it is instead using the default None.
Also, It seems to be dependent on the protocol as if it is working correctly with protocol=0

However, if I override the __reduce__ method and just call the super everything magically works.

def __reduce__(self):
    return super().__reduce__()

How it is possible? Shouldn't be the two classes (w/o __reduce__) equivalent? What am I missing?

like image 362
elabard Avatar asked Oct 02 '17 13:10

elabard


People also ask

How do you load data with pickles?

Python Pickle load You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode. Simple!

What does pickle do in Python?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

What is pickling and Unpickling in Python with example?

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

How do I reduce the size of a pickle file?

Enter the bz2 library for python, which enables bz2 compression for any file. By sacrificing some of the speed gained by pickling your data, you can compress it to a quarter of its original size.


1 Answers

How it is possible? Shouldn't be the two classes (w/o __reduce__) equivalent? What am I missing?

You're missing a crucial step: If there is no __reduce__ method (or if it fails!) it will use other means to pickle your class. So a class with __reduce__ won't behave like a class without __reduce__ (there are several special methods that behave like that)!

In your first case it will default to basic dict dumping and loading and then handling the subclasses logic. So it will create the dictionary using several __setitem__ calls and then set the instance attributes. But your __setitem__ requires the instance attribute _dict_type. If it doesn't have one it will default to the class attribute None, which fails with the

TypeError: isinstance() arg 2 must be a type or tuple of types

That's why it works if you want to pickle your TypedDict without __reduce__ if it doesn't contain any key-value pairs. Because it won't call __setitem__ and afterwards sets the instance attribute:

my_dict = TypedDict(int)

with open('out.pkl', 'wb') as fin:
    pickle.dump(my_dict, fin)

with open('out.pkl', 'rb') as fin:
    out = pickle.load(fin)

print(out._dict_type)   # int

On the other hand it works if you implement your __reduce__ method because unlike normal dicts which fail with __reduce__ - it does work for subclasses (but it's not attempted if you don't implement __reduce__):

>>> d = {1: 1}
>>> dict.__reduce__(d)
TypeError: "can't pickle dict objects"

>>> d = TypedDict(int)
>>> dict.__reduce__(d)
(<function copyreg._reconstructor>,
 (__main__.TypedDict, dict, {}),
 {'_dict_type': int})
like image 91
MSeifert Avatar answered Oct 16 '22 10:10

MSeifert