Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle a frozen dataclass that has __slots__

How do I pickle an instance of a frozen dataclass with __slots__? For example, the following code raises an exception in Python 3.7.0:

import pickle
from dataclasses import dataclass

@dataclass(frozen=True)
class A:
  __slots__ = ('a',)
  a: int

b = pickle.dumps(A(5))
pickle.loads(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'a'

This works if I remove either the frozen or the __slots__. Is this just a bug?

like image 385
drhagen Avatar asked Mar 22 '19 19:03

drhagen


People also ask

What is a Dataclass?

A data class refers to a class that contains only fields and crude methods for accessing them (getters and setters). These are simply containers for data used by other classes. These classes don't contain any additional functionality and can't independently operate on the data that they own.

How does Dataclass work in Python?

DataClasses are like normal classes in Python, but they have some basic functions like instantiation, comparing, and printing the classes already implemented. Parameters: init: If true __init__() method will be generated. repr: If true __repr__() method will be generated.

What is the Dataclass decorator?

The dataclass() decorator examines the class to find field s. A field is defined as a class variable that has a type annotation. With two exceptions described below, nothing in dataclass() examines the type specified in the variable annotation.

Can DataClasses have methods?

A dataclass can very well have regular instance and class methods. Dataclasses were introduced from Python version 3.7. For Python versions below 3.7, it has to be installed as a library.


1 Answers

The problem comes from pickle using the __setattr__ method of the instance when setting the state of the slots.

The default __setstate__ is defined in load_build in _pickle.c line 6220.

For the items in the state dict, the instance __dict__ is updated directly:

 if (PyObject_SetItem(dict, d_key, d_value) < 0)

whereas for the items in the slotstate dict, the instance's __setattr__ is used:

if (PyObject_SetAttr(inst, d_key, d_value) < 0)

Now because the instance is frozen, __setattr__ raises FrozenInstanceError when loading.

To circumvent this, you can define your own __setstate__ method which will use object.__setattr__, and not the instance's __setattr__.

The docs give some sort of warning for this:

There is a tiny performance penalty when using frozen=True: __init__() cannot use simple assignment to initialize fields, and must use object.__setattr__().

It may also be good to define __getstate__ as the instance __dict__ is always None in your case. If you don't, the state argument of __setstate__ will be a tuple (None, {'a': 5}), the first value being the value of the instance's __dict__ and the second the slotstate dict.

import pickle
from dataclasses import dataclass

@dataclass(frozen=True)
class A:
    __slots__ = ('a',)
    a: int

    def __getstate__(self):
        return dict(
            (slot, getattr(self, slot))
            for slot in self.__slots__
            if hasattr(self, slot)
        )

    def __setstate__(self, state):
        for slot, value in state.items():
            object.__setattr__(self, slot, value) # <- use object.__setattr__


b = pickle.dumps(A(5))
pickle.loads(b)

I personally would not call it a bug as the pickling process is designed to be flexible, but there is room for a feature enhancement. A revision of the pickling protocol could fix this in future. Unless I am missing something and aside of the tiny performance penalty, using PyObject_GenericSetattr for all the slots might be a reasonable fix?

like image 125
Jacques Gaudin Avatar answered Nov 10 '22 18:11

Jacques Gaudin