Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to save and load a large dictionary in Python

Tags:

I have a relatively large dictionary. How do I know the size? well when I save it using cPickle the size of the file will grow approx. 400Mb. cPickle is supposed to be much faster than pickle but loading and saving this file just takes a lot of time. I have a Dual Core laptop 2.6 Ghz with 4GB RAM on a Linux machine. Does anyone have any suggestions for a faster saving and loading of dictionaries in python? thanks

like image 784
Hossein Avatar asked Mar 09 '11 16:03

Hossein


People also ask

How do I save a large dictionary in Python?

If you just want to work with a larger dictionary than memory can hold, the shelve module is a good quick-and-dirty solution. It acts like an in-memory dict, but stores itself on disk rather than in memory. shelve is based on cPickle, so be sure to set your protocol to anything other than 0.

What is max size of Python dictionary?

It will not display the output because the computer ran out of memory before reaching 2^27. So there is no size limitation in the dictionary.

Is dict faster than list?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

Can you zip a dictionary Python?

Now you can: Use the zip() function in both Python 3 and Python 2. Loop over multiple iterables and perform different actions on their items in parallel. Create and update dictionaries on the fly by zipping two input iterables together.


2 Answers

Use the protocol=2 option of cPickle. The default protocol (0) is much slower, and produces much larger files on disk.

If you just want to work with a larger dictionary than memory can hold, the shelve module is a good quick-and-dirty solution. It acts like an in-memory dict, but stores itself on disk rather than in memory. shelve is based on cPickle, so be sure to set your protocol to anything other than 0.

The advantages of a database like sqlite over cPickle will depend on your use case. How often will you write data? How many times do you expect to read each datum that you write? Will you ever want to perform a search of the data you write, or load it one piece at a time?

If you're doing write-once, read-many, and loading one piece at a time, by all means use a database. If you're doing write once, read once, cPickle (with any protocol other than the default protocol=0) will be hard to beat. If you just want a large, persistent dict, use shelve.

like image 190
Andrew Avatar answered Nov 06 '22 06:11

Andrew


I know it's an old question but just as an update for those who still looking for an answer to this question: The protocol argument has been updated in python 3 and now there are even faster and more efficient options (i.e. protocol=3 and protocol=4) which might not work under python 2. You can read about it more in the reference.

In order to always use the best protocol supported by the python version you're using, you can simply use pickle.HIGHEST_PROTOCOL. The following example is taken from the reference:

import pickle # ... with open('data.pickle', 'wb') as f:     # Pickle the 'data' dictionary using the highest protocol available.     pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) 
like image 22
Moran Neuhof Avatar answered Nov 06 '22 07:11

Moran Neuhof