Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gracefully-degrading pickling in Python

(You may read this question for some background)

I would like to have a gracefully-degrading way to pickle objects in Python.

When pickling an object, let's call it the main object, sometimes the Pickler raises an exception because it can't pickle a certain sub-object of the main object. For example, an error I've been getting a lot is "can’t pickle module objects." That is because I am referencing a module from the main object.

I know I can write up a little something to replace that module with a facade that would contain the module's attributes, but that would have its own issues(1).

So what I would like is a pickling function that automatically replaces modules (and any other hard-to-pickle objects) with facades that contain their attributes. That may not produce a perfect pickling, but in many cases it would be sufficient.

Is there anything like this? Does anyone have an idea how to approach this?


(1) One issue would be that the module may be referencing other modules from within it.

like image 780
Ram Rachum Avatar asked Aug 28 '09 17:08

Ram Rachum


1 Answers

You can decide and implement how any previously-unpicklable type gets pickled and unpickled: see standard library module copy_reg (renamed to copyreg in Python 3.*).

Essentially, you need to provide a function which, given an instance of the type, reduces it to a tuple -- with the same protocol as the reduce special method (except that the reduce special method takes no arguments, since when provided it's called directly on the object, while the function you provide will take the object as the only argument).

Typically, the tuple you return has 2 items: a callable, and a tuple of arguments to pass to it. The callable must be registered as a "safe constructor" or equivalently have an attribute __safe_for_unpickling__ with a true value. Those items will be pickled, and at unpickling time the callable will be called with the given arguments and must return the unpicked object.

For example, suppose that you want to just pickle modules by name, so that unpickling them just means re-importing them (i.e. suppose for simplicity that you don't care about dynamically modified modules, nested packages, etc, just plain top-level modules). Then:

>>> import sys, pickle, copy_reg
>>> def savemodule(module):
...   return __import__, (module.__name__,)
... 
>>> copy_reg.pickle(type(sys), savemodule)
>>> s = pickle.dumps(sys)
>>> s
"c__builtin__\n__import__\np0\n(S'sys'\np1\ntp2\nRp3\n."
>>> z = pickle.loads(s)
>>> z
<module 'sys' (built-in)>

I'm using the old-fashioned ASCII form of pickle so that s, the string containing the pickle, is easy to examine: it instructs unpickling to call the built-in import function, with the string sys as its sole argument. And z shows that this does indeed give us back the built-in sys module as the result of the unpickling, as desired.

Now, you'll have to make things a bit more complex than just __import__ (you'll have to deal with saving and restoring dynamic changes, navigate a nested namespace, etc), and thus you'll have to also call copy_reg.constructor (passing as argument your own function that performs this work) before you copy_reg the module-saving function that returns your other function (and, if in a separate run, also before you unpickle those pickles you made using said function). But I hope this simple cases helps to show that there's really nothing much to it that's at all "intrinsically" complicated!-)

like image 87
Alex Martelli Avatar answered Sep 18 '22 17:09

Alex Martelli