Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Pickling a dict with some unpicklable items

I have an object gui_project which has an attribute .namespace, which is a namespace dict. (i.e. a dict from strings to objects.)

(This is used in an IDE-like program to let the user define his own object in a Python shell.)

I want to pickle this gui_project, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace dict) are not picklable objects. For example, some of them refer to wxPython widgets.

I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.

How can I do this?

(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)

(I do implement a GuiProject.__getstate__ method right now, to get rid of other unpicklable stuff besides namespace.)

like image 752
Ram Rachum Avatar asked Nov 02 '10 18:11

Ram Rachum


People also ask

What objects Cannot be pickled in Python?

Generally you can pickle any object if you can pickle every attribute of that object. Classes, functions, and methods cannot be pickled -- if you pickle an object, the object's class is not pickled, just a string that identifies what class it belongs to.

Can you pickle a dictionary Python?

In general, pickling a dict will fail unless you have only simple objects in it, like strings and integers. Even a really simple dict will often fail. It just depends on the contents.

What is the difference between pickling and Unpickling in Python?

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

Does pickle overwrite?

Pickle dump replaces current file data.


1 Answers

I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.

http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects

ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.

You could start with something like this (untested):

import cPickle

def persistent_id(obj):
    if isinstance(obj, wxObject):
        return "filtered:wxObject"
    else:
        return None

class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)

def persistent_load(obj_id):
    if obj_id.startswith('filtered:'):
        return FilteredObject(obj_id[9:])
    else:
        raise cPickle.UnpicklingError('Invalid persistent id')

def dump_filtered(obj, file):
    p = cPickle.Pickler(file)
    p.persistent_id = persistent_id
    p.dump(obj)

def load_filtered(file)
    u = cPickle.Unpickler(file)
    u.persistent_load = persistent_load
    return u.load()

Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.

You could make the solution more generic by filtering out objects that are not of the built-in types and have no __getstate__ method.

Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.

from cPickle import Pickler, Unpickler, UnpicklingError


class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)


class MyPickler(object):

    def __init__(self, file, protocol=0):
        pickler = Pickler(file, protocol)
        pickler.persistent_id = self.persistent_id
        self.dump = pickler.dump
        self.clear_memo = pickler.clear_memo

    def persistent_id(self, obj):
        if not hasattr(obj, '__getstate__') and not isinstance(obj,
            (basestring, int, long, float, tuple, list, set, dict)):
            return "filtered:%s" % type(obj)
        else:
            return None


class MyUnpickler(object):

    def __init__(self, file):
        unpickler = Unpickler(file)
        unpickler.persistent_load = self.persistent_load
        self.load = unpickler.load
        self.noload = unpickler.noload

    def persistent_load(self, obj_id):
        if obj_id.startswith('filtered:'):
            return FilteredObject(obj_id[9:])
        else:
            raise UnpicklingError('Invalid persistent id')


if __name__ == '__main__':
    from cStringIO import StringIO

    class UnpickleableThing(object):
        pass

    f = StringIO()
    p = MyPickler(f)
    p.dump({'a': 1, 'b': UnpickleableThing()})

    f.seek(0)
    u = MyUnpickler(f)
    obj = u.load()
    print obj

    assert obj['a'] == 1
    assert isinstance(obj['b'], FilteredObject)
    assert obj['b'].about
like image 51
Shane Hathaway Avatar answered Oct 25 '22 23:10

Shane Hathaway