Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pickle - how does it break?

Everyone knows pickle is not a secure way to store user data. It even says so on the box.

I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?

I'm particularly curious about ways in which the pickle loads operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .?

What sort of edge cases are there?

Edit: Here are some examples of the sort of thing I'm looking for:

  • In Python 2.4, you can pickle an array without error, but you can't unpickle it. http://bugs.python.org/issue1281383
  • You can't reliably pickle objects that inherit from dict and call __setitem__ before instance variables are set with __setstate__. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897
  • Python 2.4 (and 2.5?) will return a pickle value for infinity (or values close to it like 1e100000), but may (depending on platform) fail when loading. See http://bugs.python.org/issue880990 and http://bugs.python.org/issue445484
  • This last item is interesting because it reveals a case where the STOP marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.
like image 202
Paul McMillan Avatar asked Nov 09 '10 09:11

Paul McMillan


People also ask

How does Python pickling work?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

How does a pickle work?

The pickle module implements serialization protocol, which provides an ability to save and later load Python objects using special binary format. Unlike json , pickle is not limited to simple objects. It can also store references to functions and classes, as well as the state of class instances.

What does pickle dump do in Python?

Python Pickle dump dump() function to store the object data to the file.

Is python pickle compressed?

Pickled files are Python-version-specific — You might encounter issues when saving files in one Python version and reading them in the other. Try to work in identical Python versions, if possible. Pickling doesn't compress data — Pickling an object won't compress it.


2 Answers

This is a greatly simplified example of what pickle didn't like about my data structure.

import cPickle as pickle

class Member(object):
    def __init__(self, key):
        self.key = key
        self.pool = None
    def __hash__(self):
        return self.key

class Pool(object):
    def __init__(self):
        self.members = set()
    def add_member(self, member):
        self.members.add(member)
        member.pool = self

member = Member(1)
pool = Pool()
pool.add_member(member)

with open("test.pkl", "w") as f:
    pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)

with open("test.pkl", "r") as f:
    x = pickle.load(f)

Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.

In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.

$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    x = pickle.load(f)
  File "test.py", line 8, in __hash__
    return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))
like image 125
Gordon Wrigley Avatar answered Oct 18 '22 06:10

Gordon Wrigley


If you are interested in how things fail with pickle (or cPickle, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.

https://github.com/uqfoundation/dill/blob/master/dill/_objects.py

The package dill includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.

dill.dill has these functions, which you could also build for pickle or cPickle, simply with a cut-and-paste and an import pickle or import cPickle as pickle (or import dill as pickle):

def copy(obj, *args, **kwds):
    """use pickling to 'copy' an object"""
    return loads(dumps(obj, *args, **kwds))


# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
    """quick check if object pickles with dill"""
    if safe: exceptions = (Exception,) # RuntimeError, ValueError
    else:
        exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
    try:
        pik = copy(obj, **kwds)
        try:
            result = bool(pik.all() == obj.all())
        except AttributeError:
            result = pik == obj
        if result: return True
        if not exact:
            return type(pik) == type(obj)
        return False
    except exceptions:
        return False

and includes these in dill.detect:

def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
    """get items in object that fail to pickle"""
    if not hasattr(obj,'__iter__'): # is not iterable
        return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
    obj = obj.values() if getattr(obj,'values',None) else obj
    _obj = [] # can't use a set, as items may be unhashable
    [_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
    return [j for j in _obj if j is not None]


def badobjects(obj, depth=0, exact=False, safe=False):
    """get objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return obj
    return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

def badtypes(obj, depth=0, exact=False, safe=False):
    """get types for objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return type(obj)
    return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

and this last function, which is what you can use to test the objects in dill._objects

def errors(obj, depth=0, exact=False, safe=False):
    """get errors for objects that fail to pickle"""
    if not depth:
        try:
            pik = copy(obj)
            if exact:
                assert pik == obj, \
                    "Unpickling produces %s instead of %s" % (pik,obj)
            assert type(pik) == type(obj), \
                "Unpickling produces %s instead of %s" % (type(pik),type(obj))
            return None
        except Exception:
            import sys
            return sys.exc_info()[1]
    return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
like image 43
Mike McKerns Avatar answered Oct 18 '22 06:10

Mike McKerns