I'm implementing a program that needs to serialize and deserialize large objects, so I was making some tests with pickle
, cPickle
and marshal
modules to choose the best module. Along the way I found something very interesting:
I'm using dumps
and then loads
(for each module) on a list of dicts, tuples, ints, float and strings.
This is the output of my benchmark:
DUMPING a list of length 7340032
----------------------------------------------------------------------
pickle => 14.675 seconds
length of pickle serialized string: 31457430
cPickle => 2.619 seconds
length of cPickle serialized string: 31457457
marshal => 0.991 seconds
length of marshal serialized string: 117440540
LOADING a list of length: 7340032
----------------------------------------------------------------------
pickle => 13.768 seconds
(same length?) 7340032 == 7340032
cPickle => 2.038 seconds
(same length?) 7340032 == 7340032
marshal => 6.378 seconds
(same length?) 7340032 == 7340032
So, from these results we can see that marshal
was extremely fast in the dumping part of the benchmark:
14.8x times faster than
pickle
and 2.6x times faster thancPickle
.
But, for my big surprise, marshal
was by far slower than cPickle
in the loading part:
2.2x times faster than
pickle
, but 3.1x times slower thancPickle
.
And as for RAM, marshal
performance while loading was also very inefficient:
I'm guessing the reason why loading with marshal
is so slow is somehow related with the length of the its serialized string (much longer than pickle
and cPickle
).
marshal
dumps faster and loads slower?marshal
serialized string is so long?marshal
's loading is so inefficient in RAM?marshal
's loading performance?marshal
fast dumping with cPickle
fast loading?So, from these results we can see that marshal was extremely fast in the dumping part of the benchmark: 14.8x times faster than pickle and 2.6x times faster than cPickle .
Difference between Pickle and cPickle: Pickle uses python class-based implementation while cPickle is written as C functions. As a result, cPickle is many times faster than pickle.
cPickle supports most elementary data types (e.g., dictionaries, lists, tuples, numbers, strings) and combinations thereof, as well as classes and instances. Pickling classes and instances saves only the data involved, not the code.
cPickle
has a smarter algorithm than marshal
and is able to do tricks to reduce the space used by large objects. That means it'll be slower to decode but faster to encode as the resulting output is smaller.
marshal
is simplistic and serializes the object straight as-is without doing any further analyze it. That also answers why the marshal
loading is so inefficient, it simply has to do more work - as in reading more data from disk - to be able to do the same thing as cPickle
.
marshal
and cPickle
are really different things in the end, you can't really get both fast saving and fast loading since fast saving implies analyzing the data structures less which implies saving a lot of data to disk.
Regarding the fact that marshal
might be incompatible to other versions of Python, you should generally use cPickle
:
"This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal." (the python docs about marshal)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With