Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's __reduce__/copy_reg semantic and stateful unpickler

Tags:

python

pickle

I want to implement pickling support for objects belonging to my extension library. There is a global instance of class Service initialized at startup. All these objects are produced as a result of some Service method invocations and essentially belong to it. Service knows how to serialize them into binary buffers and how deserialize buffers back into objects.

It appeared that Pythons __ reduce__ should serve my purpose - implement pickling support. I started implementing one and realized that there is an issue with unpickler (first element od a tuple expected to be returned by __ reduce__). This unpickle function needs instance of a Service to be able to convert input buffer into an Object. Here is a bit of pseudo code to illustrate the issue:

class Service(object):
   ...
   def pickleObject(self,obj):
      # do serialization here and return buffer
      ...

   def unpickleObject(self,buffer):
      # do deserialization here and return new Object
      ...

class Object(object):
    ...
    def __reduce__(self):
        return self.service().unpickleObject, (self.service().pickleObject(self),)

Note the first element in a tuple. Python pickler does not like it: it says it is instancemethod and it can't be pickled. Obviously pickler is trying to store the routine into the output and wants Service instance along with function name, but this is not want I want to happen. I do not want (and really can't : Service is not pickable) to store service along with all the objects. I want service instance to be created before pickle.load is invoked and somehow that instance get used during unpickling.

Here where I came by copy_reg module. Again it appeared as it should solve my problems. This module allows to register pickler and unpickler routines per type dynamically and these are supposed to be used later on for the objects of this type. So I added this registration to the Service construction:

class Service(object):
  ...
  def __init__(self):
      ...
      import copy_reg
      copy_reg( mymodule.Object, self.pickleObject, self.unpickleObject )

self.unpickleObject is now a bound method taking service as a first parameter and buffer as second. self.pickleObject is also bound method taking service and object to pickle. copy_reg required that pickleObject routine should follow reducer semantic and returns similar tuple as before. And here the problem arose again: what should I return as the first tuple element??

class Service(object):
  ...
  def pickleObject(self,obj):
      ...
      return self.unpickleObject, (self.serialize(obj),)

In this form pickle again complains that it can't pickle instancemethod. I tried None - it does not like it either. I put there some dummy function. This works - meaning serialization phase went through fine, but during unpickling it calls this dummy function instead of unpickler I registered for the type mymodule.Object in Service constructor.

So now I am at loss. Sorry for long explanation: I did not know how to ask this question in a few lines. I can summarize my questions like this:

  1. Why does copy_reg semantic requires me to return unpickler routine from pickleObject, if I an expected to register one independently?
  2. Is there any reason to prefer copy_reg.constructor interface to register unpickler routine?
  3. How do I make pickle to use the unpickler I registered instead of one inside the stream?
  4. What should I return as first element in a tuple as pickleObject result value? Is there a "correct" value?
  5. Do I approach this whole thing correctly? Is there different/simpler solution?
like image 390
Gennadiy Rozental Avatar asked Aug 22 '12 02:08

Gennadiy Rozental


People also ask

What is __ reduce __ Python?

Whenever an object is pickled, the __reduce__ method defined by it gets called. This method returns either a string, which may represent the name of a Python global, or a tuple describing how to reconstruct this object when unpickling.

What is pickling and Unpickling in Python with example?

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

What is meant by pickling in Python?

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What pickle does is that it “serializes” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.)

What is pickle dump in Python?

Python Pickle dump dump() function to store the object data to the file. pickle. dump() function takes 3 arguments. The first argument is the object that you want to store. The second argument is the file object you get by opening the desired file in write-binary (wb) mode.


1 Answers

First of all, the copy_reg module is unlikely to help you much here: it is primarily a way to add __reduce__ like features to classes that don't have that method rather than offering any special abilities (e.g. if you want to pickle objects from some library that doesn't natively support it).

The callable returned by __reduce__ needs to be locatable in the environment where the object is to be unpickled, so an instance method isn't really appropriate. As mentioned in the Pickle documentation:

In the unpickling environment this object must be either a class, a callable registered as a “safe constructor” (see below), or it must have an attribute __safe_for_unpickling__ with a true value.

So if you defined a function (not method) as follows:

def _unpickle_service_object(buffer):
    # Grab the global service object, however that is accomplished
    service = get_global_service_object()
    return service.unpickleObject(buffer)

_unpickle_service_object.__safe_for_unpickling__ = True

You could now use this _unpickle_service_object function in the return value of your __reduce__ methods so that your objects linked to the new environment's global Service object when unpickled.

like image 154
James Henstridge Avatar answered Oct 18 '22 18:10

James Henstridge