Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cant Pickle memoized class instance

Here is the code I am using

import funcy

@funcy.memoize
class mystery(object):

    def __init__(self, num):
        self.num = num

feat = mystery(1)

with open('num.pickle', 'wb') as f:
    pickle.dump(feat,f)

Which is giving me the following error:

PicklingError: Can't pickle <class '__main__.mystery'>: it's not the 
same object as __main__.mystery

I am hoping to 1) understand why this is happening, and 2) find a solution that allows me to pickle the object (without removing the memoization). Ideally the solution would not change the call to pickle.

Running python 3.6 with funcy==1.10

like image 986
hchw Avatar asked Aug 13 '18 23:08

hchw


2 Answers

The problem is that you've applied a decorator designed for functions to a class. The result is not a class, but a function that wraps up a call to the class. This causes a number of problems (e.g., as pointed out by Aran-Fey in the comments, you can't isinstance(feat, mystery), because mystery).

But the particular problem you care about is that you can't pickle instances of inaccessible classes.

In fact, that's basically what the error message is telling you:

PicklingError: Can't pickle <class '__main__.mystery'>: it's not the 
same object as __main__.mystery

Your feat thinks its type is __main__.mystery, but that isn't a type at all, it's the function returned by the decorator that wraps that type.


The easy way to fix this would be to find a class decorator meant that does what you want. It might be called something like flyweight instead of memoize, but I'm sure plenty of examples exist.


But you can build a flyweight class by just memoizing the constructor, instead of memoizing the class:

class mystery:
    @funcy.memoize
    def __new__(cls, num):
        return super().__new__(cls)
    def __init__(self, num):
        self.num = num

… although you probably want to move the initialization into the constructor in that case. Otherwise, calling mystery(1) and then mystery(1) will return the same object as before, but also reinitialize it with self.num = 1, which is at best wasteful, and at worst incorrect. So:

class mystery:
    @funcy.memoize
    def __new__(cls, num):
        self = super().__new__(cls)
        self.num = num
        return self

And now:

>>> feat = mystery(1)
>>> feat
<__main__.mystery at 0x10eeb1278>
>>> mystery(2)
<__main__.mystery at 0x10eeb2c18>
>>> mystery(1)
<__main__.mystery at 0x10eeb1278>

And, because the type of feat is now a class that's accessible under the module-global name mystery, pickle will have no problem with it at all:

>>> pickle.dumps(feat)
b'\x80\x03c__main__\nmystery\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00numq\x03K\x01sb.'

You do still want to think about how this class should play with pickling. In particular, do you want unpickling to go through the cache? By default, it doesn't:

>>> pickle.loads(pickle.dumps(feat)) is feat
False

What's happening is that it's using the default __reduce_ex__ for pickling, which defaults to doing the equivalent of (only slightly oversimplified):

result = object.__new__(__main__.mystery)
result.__dict__.update({'num': 1})

If you want it to go through the cache, the simplest solution is this:

class mystery:
    @funcy.memoize
    def __new__(cls, num):
        self = super().__new__(cls)
        self.num = num
        return self
    def __reduce__(self):
        return (type(self), (self.num,))

If you plan to do this a lot, you might think of writing your own class decorator:

def memoclass(cls):
    @funcy.memoize
    def __new__(cls, *args, **kwargs):
        return super(cls, cls).__new__(cls)
    cls.__new__ = __new__
    return cls

But this:

  • … is kind of ugly,
  • … only works with classes that don't need to pass constructor arguments to a base class,
  • … only works with classes that don't have an __init__ (or, at least, that have an idempotent and fast __init__ that's harmless to call repeatedly),
  • … doesn't provide an easy way to hook pickling, and
  • … doesn't document or test any of those restrictions.

So, I think you're better off being explicit and just memoizing the __new__ method, or writing (or finding) something a lot fancier that does the introspection needed to make memoizing a class this way fully general. (Or, alternatively, maybe write one that only works with some restricted set of classes—e.g., a @memodataclass that's just like @dataclass but with a memoized constructor would be a lot easier than a fully general @memoclass.)

like image 176
abarnert Avatar answered Nov 08 '22 19:11

abarnert


Another approach is

class _mystery(object):

    def __init__(self, num):
        self.num = num

@funcy.memoize
def mystery(num):
    return _mystery(num)
like image 23
Lars Ericson Avatar answered Nov 08 '22 19:11

Lars Ericson