I recently asked a question regarding how to save large python objects to file. I had previously run into problems converting massive Python dictionaries into string and writing them to file via write()
. Now I am using pickle. Although it works, the files are incredibly large (> 5 GB). I have little experience in the field of such large files. I wanted to know if it would be faster, or even possible, to zip this pickle file prior to storing it to memory.
Python code would be extremely slow when it comes to implementing data serialization. If you try to create an equivalent to Pickle in pure Python, you'll see that it will be super slow. Fortunately the built-in modules which perform that are quite good.
Apart from cPickle
, you will find the marshal
module, which is a lot faster.
But it needs a real file handle (not from a file-like object).
You can import marshal as Pickle
and see the difference.
I don't think you can make a custom serializer which is a lot faster than this...
Here's an actual (not so old) serious benchmark of Python serializers
You can compress the data with bzip2:
from __future__ import with_statement # Only for Python 2.5
import bz2,json,contextlib
hugeData = {'key': {'x': 1, 'y':2}}
with contextlib.closing(bz2.BZ2File('data.json.bz2', 'wb')) as f:
json.dump(hugeData, f)
Load it like this:
from __future__ import with_statement # Only for Python 2.5
import bz2,json,contextlib
with contextlib.closing(bz2.BZ2File('data.json.bz2', 'rb')) as f:
hugeData = json.load(f)
You can also compress the data using zlib or gzip with pretty much the same interface. However, both zlib and gzip's compression rates will be lower than the one achieved with bzip2 (or lzma).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With