Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with `pickle.load` calling `__setitem__` which is not ready for use yet?

I was trying to implement a (prototype, not production) version of a persistent dictionary that uses pickle on disk as persistent storage. However, pickle.load calls __setitem__ for its own purposes, and that's the method that is (of course) overridden to ensure changes to the dictionary are propagated back to the persistent storage -- and so it calls pickle.dump. Of course, it's not ok to call pickle.dump as every item is being set during unpickling.

Is there any way to solve this, other than by brute force (as below)? I tried reading Pickling Class Instances in search of a solution using of special methods, but didn't find any.

The code below monitors whether unpickling is in progress, and skips pickle.dump in that case; while it works fine, it feels hacky.

import os, pickle

class PersistentDict(dict):
    def __new__(cls, *args, **kwargs):
        if not args: # when unpickling
            obj = dict.__new__(cls)
            obj.uninitialized = True
            return obj
        path, *args = args
        if os.path.exists(path):
            obj = pickle.load(open(path, 'rb'))
            del obj.uninitialized
            return obj
        else:
            obj = dict.__new__(cls, *args, **kwargs)
            obj.path = path
            obj.dump()
            return obj

    def __init__(self, *args, **kwargs):
        pass

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        self.dump()

    def __delitem__(self, key):
        super().__delitem__(key)
        self.dump()

    def dump(self):
        if not hasattr(self, 'uninitialized'):
            pickle.dump(self, open(self.path, 'wb'))

    def clear(self):
        os.remove(self.path)

pd = PersistentDict('abc')
assert pd == {}
pd[1] = 2
assert pd == {1: 2}
pd[2] = 4
assert pd == {1: 2, 2: 4}
del pd[1]
assert pd == {2: 4}
xd = PersistentDict('abc')
assert xd == {2: 4}
xd[3] = 6
assert xd == {2: 4, 3: 6}
yd = PersistentDict('abc')
assert yd == {2: 4, 3: 6}
yd.clear()
like image 358
max Avatar asked Oct 18 '22 17:10

max


1 Answers

Inheriting directly from dict is not advised when trying to get to fancy dictionary implementations. For one thing, Python's ABI takes some shortcuts on dict class that might eventually skip some calls tos certain dunder methods - and also, as you can perceive when pikcling and unpickling - dictionaries and direct subclasses of it will be treated in a different way than ordinary objects (which have their __dict__ attribute pickled, not their keys set with __setitem__.

So, for one thing, start with inheriting from collections.UserDict - this is a different implementation of dict which ensures all data access is done through a proper Python side call to the dunder special methods. You might even want to implement it as an implementation of collections.abc.MutableMapping - that ensures you have to implement a minimal number of methods in your code to have your class working like if it were a real dictionary.

Second thing: the Pickle protocol will do "its thing" by default - which in mapping classes is (I haven't checked, but apparently is), pickling the (key, value) pairs and calling __setitem__ for each of those on unpicling. But the pickling behavior is fully customizable- as you can see on the documentation - you can simply implement explict __getstate__ and __setstate__ methods on your class to have full control over the pickling/unpickling code.

Example using MutableMapping, and storing the dictionary contents in an associated internal dictionary:

from collections.abc import MutableMapping

class SpecialDict(MutableMapping):
    def __init__(self, path, **kwargs):
        self.path = path
        self.content = dict(**kwargs)
        self.dump()
    def __getitem__(self, key):
        return self.content[key]

    def __setitem__(self, key, value):
        self.content[key] = value
        self.dump()

    def __delitem__(self, key):
        del self.content[key]
        self.dump()

    def __iter__(self):
        return iter(self.content)

    def __len__(self):
        return len(self.content)

    def dump(self):
        ...

    def __getstate__(self):
        return (self.path, self.content)

    def __setstate__(self, state):
        self.path = state[0]
        self.content = state[1]

BTW, a big advantage of using the MutableMapping super class is that it is guarranteed that if you implement properly the methods described in the documentation, your code is ready for production (so, no need to worry about missing exquisite corner cases).

like image 174
jsbueno Avatar answered Oct 20 '22 09:10

jsbueno