Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store object using Python pickle, and load it into different namespace

I'd like to pass object state between two Python programs (one is my own code running standalone, one is a Pyramid view), and different namespaces. Somewhat related questions are here or here, but I can't quite follow through with them for my scenario.

My own code defines a global class (i.e. __main__ namespace) of somewhat complexish structure:

# An instance of this is a colorful mess of nested lists and sets and dicts.
class MyClass :
    def __init__(self) :
        data = set()
        more = dict()
        ... 

    def do_sth(self) :
        ...

At some point I pickle an instance of this class:

c = MyClass()
# Fill c with data.

# Pickle and write the MyClass instance within the __main__ namespace.
with open("my_c.pik", "wb") as f :
    pickle.dump(c, f, -1)

A hexdump -C my_c.pik shows that the first couple of bytes contain __main__.MyClass from which I assume that the class is indeed defined in the global namespace, and that this is somehow a requirement for reading the pickle. Now I'd like to load this pickled MyClass instance from within a Pyramid view, which I assume is a different namespace:

# In Pyramid (different namespace) read the pickled MyClass instance.
with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

But that results in the following error:

File ".../views.py", line 60, in view_handler_bla
  c = pickle.load(f)
AttributeError: 'module' object has no attribute 'MyClass'

It seems to me that the MyClass definition is missing in whatever namespace the view code executes? I had hoped (assumed) that pickling is a somewhat opaque process which allows me to read a blob of data into whichever place I chose. (More on Python's class names and namespaces is here.)

How can I handle this properly? (Ideally without having to import stuff across...) Can I somehow find the current namespace and inject MyClass (like this answer seems to suggest)?

Poor Solution

It seems to me that if I refrain from defining and using MyClass and instead fall back to plain built-in datatypes, this wouldn't be a problem. In fact, I could "serialize" the MyClass object into a sequence of calls that pickle the individual elements of the MyClass instance:

# 'Manual' serialization of c works, because all elements are built-in types.
pickle.dump(c.data, f, -1)
pickle.dump(c.more, f, -1)
...

This would defeat the purpose of wrapping data into classes though.

Note

Pickling takes care only of the state of a class, not of any functions defined in the scope of the class (e.g. do_sth() in the above example). That means that loading a MyClass instance into a different namespace without the proper class definition loads only the instance data; calling a missing function like do_sth() will cause an AttributeError.

like image 586
Jens Avatar asked Nov 02 '14 06:11

Jens


People also ask

Can pickle store any Python object?

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What pickle does is that it “serializes” the object first before writing it to file.

How do you load a pickle object?

Python Pickle load You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode. Simple!

How do I import a pickle module?

Pickling Files To use pickle, start by importing it in Python. To pickle this dictionary, you first need to specify the name of the file you will write it to, which is dogs in this case. Note that the file does not have an extension. To open the file for writing, simply use the open() function.


2 Answers

Use dill instead of pickle, because dill by default pickles by serializing the class definition and not by reference.

>>> import dill
>>> class MyClass:
...   def __init__(self): 
...     self.data = set()
...     self.more = dict()
...   def do_stuff(self):
...     return sorted(self.more)
... 
>>> c = MyClass()
>>> c.data.add(1)
>>> c.data.add(2)
>>> c.data.add(3)
>>> c.data
set([1, 2, 3])
>>> c.more['1'] = 1
>>> c.more['2'] = 2
>>> c.more['3'] = lambda x:x
>>> def more_stuff(self, x):  
...   return x+1
... 
>>> c.more_stuff = more_stuff
>>> 
>>> with open('my_c.pik', "wb") as f:
...   dill.dump(c, f)
... 
>>> 

Shut down the session, and restart in a new session…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('my_c.pik', "rb") as f:
...   c = dill.load(f)
... 
>>> c.data
set([1, 2, 3])
>>> c.more
{'1': 1, '3': <function <lambda> at 0x10473ec80>, '2': 2}
>>> c.do_stuff()
['1', '2', '3']
>>> c.more_stuff(5)
6

Get dill here: https://github.com/uqfoundation/dill

like image 105
Mike McKerns Avatar answered Sep 28 '22 14:09

Mike McKerns


Solution 1

On pickle.load, the module __main__ needs to have a function or class called MyClass. This does not need to be the original class with the original source code. You can put other methods in it. It should work.

class MyClass(object):
    pass

with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

Solution 2

Use the copyreg module which is used to register constructors and pickle functions to pickle specific objects. This is the example given by the module for a complex number:

def pickle_complex(c):
    return complex, (c.real, c.imag)

copyreg.pickle(complex, pickle_complex, complex)

Solution 3

Override the persistent_id method of the Pickler and Unpickler. pickler.persistent_id(obj) shall return an identifier that can be resolved by unpickler.persistent_id(id) to the object.

like image 40
User Avatar answered Sep 28 '22 14:09

User