Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check which detail of a complex object cannot be pickled

Overview

I want to serialize my complex objects. It looks simple but every step creates a different problem.

In the end, other programmers must also be able to create a complex object inherited from my parent object. And this object should be pickleable, for Python 2.7 and Python3.x.

I started with a simple object and used pickle.dump and pickle.load with success.

I then created multiple complex objects (similar but not identical), some of which can be dumped, and a few cannot.

Debugging

The pickle library knows which objects can be pickled or not. In theory this means pdb could be customized to enable pickle debugging.

Alternative serialization libraries

I wanted a reliable serialization independent of the content of the object. So I searched for other serialization tools:

  • Cerealizer which selftest failed and seems to be outdated.
  • MessagePack which is not available for Python 3.
  • I tried JSON and got the error: builtins.TypeError: <lib.scan.Content object at 0x7f37f1e5da50> is not JSON serializable
  • I looked at Marshal and Shelve but all refer to Pickle.

Digging into using pickle

I have read How to check if an object is pickleable which did not give me an answer.

The closest I found was How to find source of error in Python Pickle on massive object

I adjusted this to:

import pickle

if _future_.isPython3():        
    class MyPickler(pickle._Pickler):        
        def save(self, obj):             
            try:
                pickle._Pickler.save(self, obj)
            except:
                print ('pick(3.x) {0} of type {1}'.format(obj, type(obj)))                  
else:
    class MyPickler (pickle.Pickler):

        def save(self, obj):         
            try:
                pickle.Pickler.save(self, obj)
            except:
                print('pick(2.x)', obj, 'of type', type(obj))

I call this code using:

def save(obj, file):  
    if platform.python_implementation() == 'CPython':
        myPickler = MyPickler(file)                
        myPickler.save(obj) 

I expect the save is executed until an exception is raised. The content of obj is printed so I can see exactly where the error orcurs. But the result is:

pick(3.x)  <class 'module'> of type <class 'type'>
pick(3.x)  <class 'module'> of type <class 'type'>
pick(3.x)  <class 'Struct'> of type <class 'type'>
pick(3.x)  <class 'site.setquit.<locals>.Quitter'> of type <class 'type'>
pick(3.x)  <class 'site.setquit.<locals>.Quitter'> of type <class 'type'>
pick(3.x)  <class 'module'> of type <class 'type'>
pick(3.x)  <class 'sys.int_info'> of type <class 'type'>
...

This is just a small part of the result. I do not comprehend this. It does not help me which detail is wrong to pickle. And how to solve this.

I have seen : http://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled but it does not help me much if I cannot detect which line in my code cannot be pickled.

The code in my complex object works as expecting, in the end running a generated code as:

sys.modules['unum']

But when pickling it seems the 'module' is not read as expected.

Attempt at a solution

Some background to clear what I mean. I have had programs who worked, and suddenly did not work. It might be an update or an other change resource. Programs who work for others and not for me and opposite.

This is a general problem so I want to develop a program to check all kind of resources. The amount of different kind of resources is huge. So I have one parent object class with all general behaviour. And a as small as possible detail class for the specific resources.

This is done in my child resources classes.

These resources have to be checked with different versions f.e. Python 2.7 or Python 3.3 If you run with Python 2.7.5 the resource is valid if Python 2.7 and higher is required. So the check must be a bit more then an equal value. This is specified as a single statement in the custom config file. There is a specific config file for each program, which must be as small as possible to be used. One resource is checked with a single statement in the config file.

The general class is about 98% of the code. The specific resources and config is just about 2% of the code. So it is very easy to add new resources to check, and new config files for new programs.

This child resources :

class R_Sys(r_base.R_Base):
    '''
    doc : http://docs.python.org/3/library/sys.html#module-sys

    sys.modules returns only a list of imported module

    statement :
    sys.modules['psutil'] #  may return false (installed but not imported
    but the statements :
    import psutil
    sys.modules['psutil'] # will return true, now psutil is imported
    '''

    allowed_names = ('modules', 'path', 'builtin_module_names', 'stdin')

    allowed_keys_in_dict_config = ('name',)
    allowed_operators = ("R_NONE", "=", 'installed')  # installed only for modules

    class_group = 'Sys'
    module_used = sys   


    def __init__(self, check_type, group, name):
        super(R_Sys, self).__init__(check_type, group, name)

called by this config statement :

sc.analyse(r.R_Sys, c.ct('DETECT'), dict(name='path'))

can be succefull pickled. But with config statement :

sc.analyse(r.R_Sys, c.ct('DETECT'),
                     dict(name='modules', tuplename='unum') )  

it fails.

This means in my opinion that 98% main code should be ok, otherwise the first statement would fail as well.

There are class attributes in the child class. These are required to function properly. And again in the first call the dump execute well. I did not do a load yet.

like image 551
Bernard Avatar asked Mar 06 '14 18:03

Bernard


People also ask

What objects Cannot be pickled in Python?

Generally you can pickle any object if you can pickle every attribute of that object. Classes, functions, and methods cannot be pickled -- if you pickle an object, the object's class is not pickled, just a string that identifies what class it belongs to.

Which function is used for pickling?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

What is pickling process which module is needed for it?

Practical Data Science using PythonPython pickle module is used for serializing and de-serializing python object structures. The process to converts any kind of python objects (list, dict, etc.) into byte streams (0s and 1s) is called pickling or serialization or flattening or marshalling.


1 Answers

dill has some good diagnostic tools for pickling, the best of which is the pickle trace (similar to what you have implemented).

Let's build a complex object, and explore:

>>> import dill
>>> class Foo(object):
...   @classmethod
...   def bar(self, x):
...     return self.z + x
...   def baz(self, z):
...     self.z = z
...   z = 1
...   zap = lambda self, x: x + self.bar(x)
... 
>>> f = Foo()
>>> f.zap(3)
7
>>> f.baz(7)
>>> f.z 
7

Turn on "pickle trace":

>>> dill.detect.trace(True)
>>> _f = dill.dumps(f)
T2: <class '__main__.Foo'>
F2: <function _create_type at 0x10f94a668>
T1: <type 'type'>
F2: <function _load_type at 0x10f94a5f0>
T1: <type 'object'>
D2: <dict object at 0x10f96bb40>
Cm: <classmethod object at 0x10f9ad408>
T4: <type 'classmethod'>
F1: <function bar at 0x10f9aa9b0>
F2: <function _create_function at 0x10f94a6e0>
Co: <code object bar at 0x10f9a9130, file "<stdin>", line 2>
F2: <function _unmarshal at 0x10f94a578>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f96b5c8>
F1: <function baz at 0x10f9aaa28>
Co: <code object baz at 0x10f9a9ab0, file "<stdin>", line 5>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f969d70>
F1: <function <lambda> at 0x10f9aaaa0>
Co: <code object <lambda> at 0x10f9a9c30, file "<stdin>", line 8>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f97d050>
D2: <dict object at 0x10e97b4b0>
>>> f_ = dill.loads(_f)
>>> f_.z
7

Ok, dill can pickle this object… so let's make it harder. We first turn off trace.

>>> dill.detect.trace(False)
>>> 
>>> f.y = xrange(5)
>>> f.w = iter([1,2,3])
>>> 
>>> dill.pickles(f)
False

Ok, now dill fails. So what causes the failure? We can look at all of the objects that fail to pickle if we dig into our object f.

>>> dill.detect.badtypes(f)
<class '__main__.Foo'>
>>> dill.detect.badtypes(f, depth=1)
{'__hash__': <type 'method-wrapper'>, '__setattr__': <type 'method-wrapper'>, '__reduce_ex__': <type 'builtin_function_or_method'>, 'baz': <type 'instancemethod'>, '__reduce__': <type 'builtin_function_or_method'>, '__str__': <type 'method-wrapper'>, '__format__': <type 'builtin_function_or_method'>, '__getattribute__': <type 'method-wrapper'>, 'zap': <type 'instancemethod'>, '__delattr__': <type 'method-wrapper'>, '__repr__': <type 'method-wrapper'>, 'w': <type 'listiterator'>, '__dict__': <type 'dict'>, '__sizeof__': <type 'builtin_function_or_method'>, '__init__': <type 'method-wrapper'>}
>>> dill.detect.badobjects(f, depth=1)
{'__hash__': <method-wrapper '__hash__' of Foo object at 0x10f9b0050>, '__setattr__': <method-wrapper '__setattr__' of Foo object at 0x10f9b0050>, '__reduce_ex__': <built-in method __reduce_ex__ of Foo object at 0x10f9b0050>, 'baz': <bound method Foo.baz of <__main__.Foo object at 0x10f9b0050>>, '__reduce__': <built-in method __reduce__ of Foo object at 0x10f9b0050>, '__str__': <method-wrapper '__str__' of Foo object at 0x10f9b0050>, '__format__': <built-in method __format__ of Foo object at 0x10f9b0050>, '__getattribute__': <method-wrapper '__getattribute__' of Foo object at 0x10f9b0050>, 'zap': <bound method Foo.<lambda> of <__main__.Foo object at 0x10f9b0050>>, '__delattr__': <method-wrapper '__delattr__' of Foo object at 0x10f9b0050>, '__repr__': <method-wrapper '__repr__' of Foo object at 0x10f9b0050>, 'w': <listiterator object at 0x10f9b0550>, '__dict__': {'y': xrange(5), 'z': 7, 'w': <listiterator object at 0x10f9b0550>}, '__sizeof__': <built-in method __sizeof__ of Foo object at 0x10f9b0050>, '__init__': <method-wrapper '__init__' of Foo object at 0x10f9b0050>}

Hmmm. That's a lot. Of course, not all of these objects have to serialize for our object to serialize… however at least one of them is causing the failure.

The natural thing to do is look at the failure we are getting… So, what's the error that would be thrown? Maybe that will give a hint.

>>> dill.detect.errors(f)
PicklingError("Can't pickle <type 'listiterator'>: it's not found as __builtin__.listiterator",)

Aha, the listiterator is a bad object. Let's dig deeper by turning "trace" back on.

>>> dill.detect.trace(True)
>>> dill.pickles(f)
T2: <class '__main__.Foo'>
F2: <function _create_type at 0x10f94a668>
T1: <type 'type'>
F2: <function _load_type at 0x10f94a5f0>
T1: <type 'object'>
D2: <dict object at 0x10f9826e0>
Cm: <classmethod object at 0x10f9ad408>
T4: <type 'classmethod'>
F1: <function bar at 0x10f9aa9b0>
F2: <function _create_function at 0x10f94a6e0>
Co: <code object bar at 0x10f9a9130, file "<stdin>", line 2>
F2: <function _unmarshal at 0x10f94a578>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f96b5c8>
F1: <function baz at 0x10f9aaa28>
Co: <code object baz at 0x10f9a9ab0, file "<stdin>", line 5>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f969d70>
F1: <function <lambda> at 0x10f9aaaa0>
Co: <code object <lambda> at 0x10f9a9c30, file "<stdin>", line 8>
D1: <dict object at 0x10e8d6168>
D2: <dict object at 0x10f97d050>
D2: <dict object at 0x10e97b4b0>
Si: xrange(5)
F2: <function _eval_repr at 0x10f94acf8>
T4: <type 'listiterator'>
False

Indeed, it stops at the listiterator. However, notice (just above) that the xrange does pickle. So, let's replace the iter with xrange

>>> f.w = xrange(1,4)  
>>> dill.detect.trace(False)
>>> dill.pickles(f)
True
>>> 

Our object now pickles again.

dill has a bunch of other pickle detection tools built-in, including methods to trace which object points to which (useful for debugging recursive pickling failures).

I believe that cloudpickle also has some similar tools to dill for pickle debugging… but the main tool in either case is similar to what you have built.

like image 194
Mike McKerns Avatar answered Oct 28 '22 13:10

Mike McKerns