Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find source of error in Python Pickle on massive object

I've taken over somebody's code for a fairly large project. I'm trying to save program state, and there's one massive object which stores pretty much all the other objects. I'm trying to pickle this object, but I get this error:

pickle.PicklingError: Can't pickle : it's not found as builtin.module

From what I can find on google, this is because somewhere I'm importing something outside of python init, or that a class attribute is referencing a module. So, I've got a two questions:

  1. Can anybody confirm that that's why this error is being given? Am I looking for the right things in my code?

  2. Is there a way to find what line of code/object member is causing the difficulties in pickle? The traceback only gives the line in pickle where the error occurs, not the line of the object being pickled.

like image 344
jmite Avatar asked Jul 05 '11 23:07

jmite


People also ask

What objects Cannot be pickled in Python?

Classes, functions, and methods cannot be pickled -- if you pickle an object, the object's class is not pickled, just a string that identifies what class it belongs to.

Are pickles unsafe in Python?

Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case).

Why are pickles insecure in Python?

Pickle's ProsPickle constructs arbitrary Python objects by invoking arbitrary functions, that's why it is not secure. However, this enables it to serialise almost any Python object that JSON and other serialising methods will not do.

How do I Unpickle a pickle file?

As we said earlier, the load() method can be used to unpickle the pickled Python object. You have to first open the pickled file using rb (read-binary) permission and pass the opened file to the load() method, as shown below. The load() method unpickles the data and returns the actual object.


2 Answers

2) You can subclass pickle.Pickler and monkey-patch it to show a log of what it's pickling. This should make it easier to trace where the problem is.

import pickle
class MyPickler (pickle.Pickler):
    def save(self, obj):
        print 'pickling object', obj, 'of type', type(obj)
        pickle.Pickler.save(self, obj)

This will only work with the Python implementation of pickle.Pickler. In Python 3.x, the pickle module uses the C implementation by default, the pure-Python version of Pickler is called _Pickler.

# Python 3.x
import pickle
class MyPickler (pickle._Pickler):
    def save(self, obj):
        print ('pickling object  {0} of type {1}'.format(obj, type(obj))
        pickle._Pickler.save(self, obj)
like image 94
tjollans Avatar answered Oct 15 '22 11:10

tjollans


Something like this exists in dill. Let's look at a list of objects, and see what we can do:

>>> import dill
>>> f = open('whatever', 'w')
>>> f.close()
>>> 
>>> l = [iter([1,2,3]), xrange(5), open('whatever', 'r'), lambda x:x]
>>> dill.detect.trace(False)
>>> dill.pickles(l)
False

Ok, dill fails to pickle the list. So what's the problem?

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False

Ok, the first item in the list fails to pickle. What about the rest?

>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
Fi: <open file 'whatever', mode 'r' at 0x10699c810>
F2: <function _create_filehandle at 0x106991848>
B2: <built-in function open>
F1: <function <lambda> at 0x1069f6848>
F2: <function _create_function at 0x1069916e0>
Co: <code object <lambda> at 0x105a0acb0, file "<stdin>", line 1>
F2: <function _unmarshal at 0x106991578>
D1: <dict object at 0x10591d168>
D2: <dict object at 0x1069b1050>
[False, True, True, True]

Hm. The other objects pickle just fine. So, let's replace the first object.

>>> dill.detect.trace(False)
>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
True
>>> _l = dill.loads(dill.dumps(l))

Now our object pickles. Well, we could be taking advantage of some built-in object sharing that happens for pickling on linux/unix/mac… so what about a stronger check, like actually pickling across a sub-process (like happens on windows)?

>>> dill.check(l)        
[xrange(1, 4), xrange(5), <open file 'whatever', mode 'r' at 0x107998810>, <function <lambda> at 0x1079ec410>]
>>> 

Nope, the list still works… so this is an object that could be sent to another process successfully.

Now, with regard to your error, which everyone seemed to ignore…

The ModuleType object is not pickleable, and that's causing your error.

>>> import types
>>> types.ModuleType 
<type 'module'>
>>>
>>> import pickle
>>> pickle.dumps(types.ModuleType)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type 'module'>: it's not found as __builtin__.module

However, if we import dill, it magically works.

>>> import dill
>>> pickle.dumps(types.ModuleType)
"cdill.dill\n_load_type\np0\n(S'ModuleType'\np1\ntp2\nRp3\n."
>>> 
like image 40
Mike McKerns Avatar answered Oct 15 '22 10:10

Mike McKerns