Consider the following MWE:
#import dill as pickle # Dill exhibits similar behavior
import pickle
class B:
def __init__(self):
self.links = set()
class A:
def __init__(self, base: B):
self.base = base
base.links.add(self)
def __hash__(self):
return hash(self.base)
def __eq__(self, other):
return self.base == other.base
pickled = pickle.dumps(A(B())) # Success
print(pickle.loads(pickled)) # Not so much
The above example fails with the following exception:
Traceback (most recent call last):
File "./mwe.py", line 26, in <module>
print(pickle.loads(pickled))
File "./mwe.py", line 18, in __hash__
return hash(self.base)
AttributeError: 'A' object has no attribute 'base'
As I understand the problem, pickle attempts to deserialize B.links
before it deserializes A
. The set
instance used in B
attempts to invoke A.__hash__
at some point, and since the instance of A
is not yet fully constructed, it cannot compute its own hash, making everyone sad.
How do I get around this without breaking circular references? (breaking the cycles would be a lot of work because the object I'm trying to serialize is hilariously complex)
I think you've correctly identified the cause of the problem. Both instances depend on the other, and pickle
fails to initialize them in the correct order. This could be considered a bug, but luckily there's an easy workaround.
Pickle allows us to customize how objects are pickled with the __getstate__
and __setstate__
functions. We can use this to manually set the missing base
attribute of the A
instance before it is hashed:
class B:
def __init__(self):
self.links = set()
def __getstate__(self):
# dump a tuple instead of a set so that the __hash__ function won't be called
return tuple(self.links)
def __setstate__(self, state):
self.links= set()
for link in state:
link.base= self # set the missing attribute
self.links.add(link) # now it can be hashed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With