Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dill vs cPickle speed difference

I am trying to serialize thousands of objects and some of these objects are lambda objects.

Since cPickle doesn't work for lambdas, I tried using dill. However, the drop in computational speed is more than 10 times when unpickleing (or undilling (?)). Looking through the source, it seems that dill uses pickle internally which might be the reason for the speed drop.

Is there another option for me that combine the best of both modules?

EDIT: The most significant speed drop is during unpickleing.

like image 485
Tohiko Avatar asked Jun 19 '16 10:06

Tohiko


People also ask

Is cPickle faster than pickle?

Difference between Pickle and cPickle: Pickle uses python class-based implementation while cPickle is written as C functions. As a result, cPickle is many times faster than pickle.

Is Dill faster than pickle?

Note: Before you use dill instead of pickle , keep in mind that dill is not included in the standard library of the Python interpreter and is typically slower than pickle . Even though dill lets you serialize a wider range of objects than pickle , it can't solve every serialization problem that you may have.

What is cPickle in Python?

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

What is Dill Python?

About DillSerialization is the process of converting an object to a byte stream, and the inverse of which is converting a byte stream back to a python object hierarchy. dill provides the user the same interface as the pickle module, and also includes some additional features.


1 Answers

I'm the dill author. Yes, dill is slower typically, but that's the penalty you pay for more robust serialization. If you are serializing a lot of classes and functions, then you might want to try one of the dill variants in dill.settings If you use byref=True then dill will pickle several objects by reference (which is faster then the default). Other settings trade off picklibility for speed in selected objects.

In [1]: import dill

In [2]: f = lambda x:x

In [3]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 286 us per loop

In [4]: dill.settings['byref'] = True

In [5]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 237 us per loop

In [6]: dill.settings
Out[6]: {'byref': True, 'fmode': 0, 'protocol': 2, 'recurse': False}

In [7]: dill.settings['recurse'] = True

In [8]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 408 us per loop

In [9]: class Foo(object):
   ...:     x = 1
   ...:     def bar(self, y):
   ...:         return y + self.x
   ...:     

In [10]: g = Foo()

In [11]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.6 us per loop

In [12]: dill.settings['recurse'] = False

In [13]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.4 us per loop

In [14]: dill.settings['byref'] = False

In [15]: %timeit dill.loads(dill.dumps(g))
1000 loops, best of 3: 499 us per loop

In [16]: 
like image 129
Mike McKerns Avatar answered Sep 25 '22 08:09

Mike McKerns