Everyone knows pickle is not a secure way to store user data. It even says so on the box.
I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4
. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?
I'm particularly curious about ways in which the pickle loads
operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .
?
What sort of edge cases are there?
Edit: Here are some examples of the sort of thing I'm looking for:
__setitem__
before instance variables are set with __setstate__
. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897
STOP
marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.
The pickle module implements serialization protocol, which provides an ability to save and later load Python objects using special binary format. Unlike json , pickle is not limited to simple objects. It can also store references to functions and classes, as well as the state of class instances.
Python Pickle dump dump() function to store the object data to the file.
Pickled files are Python-version-specific — You might encounter issues when saving files in one Python version and reading them in the other. Try to work in identical Python versions, if possible. Pickling doesn't compress data — Pickling an object won't compress it.
This is a greatly simplified example of what pickle didn't like about my data structure.
import cPickle as pickle
class Member(object):
def __init__(self, key):
self.key = key
self.pool = None
def __hash__(self):
return self.key
class Pool(object):
def __init__(self):
self.members = set()
def add_member(self, member):
self.members.add(member)
member.pool = self
member = Member(1)
pool = Pool()
pool.add_member(member)
with open("test.pkl", "w") as f:
pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)
with open("test.pkl", "r") as f:
x = pickle.load(f)
Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.
In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.
$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
File "test.py", line 25, in <module>
x = pickle.load(f)
File "test.py", line 8, in __hash__
return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))
If you are interested in how things fail with pickle
(or cPickle
, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
The package dill
includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.
dill.dill
has these functions, which you could also build for pickle
or cPickle
, simply with a cut-and-paste and an import pickle
or import cPickle as pickle
(or import dill as pickle
):
def copy(obj, *args, **kwds):
"""use pickling to 'copy' an object"""
return loads(dumps(obj, *args, **kwds))
# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
"""quick check if object pickles with dill"""
if safe: exceptions = (Exception,) # RuntimeError, ValueError
else:
exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
try:
pik = copy(obj, **kwds)
try:
result = bool(pik.all() == obj.all())
except AttributeError:
result = pik == obj
if result: return True
if not exact:
return type(pik) == type(obj)
return False
except exceptions:
return False
and includes these in dill.detect
:
def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
"""get items in object that fail to pickle"""
if not hasattr(obj,'__iter__'): # is not iterable
return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
obj = obj.values() if getattr(obj,'values',None) else obj
_obj = [] # can't use a set, as items may be unhashable
[_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
return [j for j in _obj if j is not None]
def badobjects(obj, depth=0, exact=False, safe=False):
"""get objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return obj
return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
def badtypes(obj, depth=0, exact=False, safe=False):
"""get types for objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return type(obj)
return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
and this last function, which is what you can use to test the objects in dill._objects
def errors(obj, depth=0, exact=False, safe=False):
"""get errors for objects that fail to pickle"""
if not depth:
try:
pik = copy(obj)
if exact:
assert pik == obj, \
"Unpickling produces %s instead of %s" % (pik,obj)
assert type(pik) == type(obj), \
"Unpickling produces %s instead of %s" % (type(pik),type(obj))
return None
except Exception:
import sys
return sys.exc_info()[1]
return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With