dill vs cPickle speed difference

Tags:

I am trying to serialize thousands of objects and some of these objects are lambda objects.

Since cPickle doesn't work for lambdas, I tried using dill. However, the drop in computational speed is more than 10 times when unpickleing (or undilling (?)). Looking through the source, it seems that dill uses pickle internally which might be the reason for the speed drop.

Is there another option for me that combine the best of both modules?

EDIT: The most significant speed drop is during unpickleing.

485

asked Jun 19 '16 10:06

Tohiko

1 Answers

I'm the dill author. Yes, dill is slower typically, but that's the penalty you pay for more robust serialization. If you are serializing a lot of classes and functions, then you might want to try one of the dill variants in dill.settings If you use byref=True then dill will pickle several objects by reference (which is faster then the default). Other settings trade off picklibility for speed in selected objects.

In [1]: import dill

In [2]: f = lambda x:x

In [3]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 286 us per loop

In [4]: dill.settings['byref'] = True

In [5]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 237 us per loop

In [6]: dill.settings
Out[6]: {'byref': True, 'fmode': 0, 'protocol': 2, 'recurse': False}

In [7]: dill.settings['recurse'] = True

In [8]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 408 us per loop

In [9]: class Foo(object):
   ...:     x = 1
   ...:     def bar(self, y):
   ...:         return y + self.x
   ...:     

In [10]: g = Foo()

In [11]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.6 us per loop

In [12]: dill.settings['recurse'] = False

In [13]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.4 us per loop

In [14]: dill.settings['byref'] = False

In [15]: %timeit dill.loads(dill.dumps(g))
1000 loops, best of 3: 499 us per loop

In [16]:

129

answered Sep 25 '22 08:09

Mike McKerns

Related questions
                            
                                How to map a function with additional parameter using the new Dataset api in TF1.3?
                            
                                Running a python script on Google Cloud Compute Engine
                            
                                How to access SparkContext from SparkSession instance?
                            
                                Add new rows to pyspark Dataframe
                            
                                Plotly saving multiple plots into a single html
                            
                                python: slow timeit() function
                            
                                python-like Java IO library?
                            
                                How to make an auto-filled and auto-incrementing field in django admin
                            
                                Python: can unittest display expected and actual values?
                            
                                filtering dropdown values in django admin
                            
                                itertools.groupby in a django template
                            
                                Monitoring Rsync Progress
                            
                                Python: Pinpointing the Linear Part of a Slope [closed]
                            
                                Query whether Python's threading.Lock is locked or not
                            
                                python module import - relative paths issue
                            
                                python version 3.4 does not support a 'ur' prefix
                            
                                Appropriate choice of authentication class for python REST API used by web app
                            
                                How do I make Python3 the default Python in Geany
                            
                                Change Tkinter Frame Title [duplicate]
                            
                                Django Localhost CORS not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dill vs cPickle speed difference

Tags:

python

lambda

pickle

dill

Tohiko

People also ask

1 Answers

Mike McKerns

Recent Activity

Donate For Us