I have a relatively large dictionary. How do I know the size? well when I save it using <code>cPickle</code> the size of the file will grow approx. 400Mb. <code>cPickle</code> is supposed to be much faster than <code>pickle</code> but loading and saving this file just takes a lot of time. I have a Dual Core laptop 2.6 Ghz with 4GB RAM on a Linux machine. Does anyone have any suggestions for a faster saving and loading of dictionaries in python? thanks

I know it's an old question but just as an update for those who still looking for an answer to this question: The <code>protocol</code> argument has been updated in python 3 and now there are even faster and more efficient options (i.e. <code>protocol=3</code> and <code>protocol=4</code>) which might not work under python 2. You can read about it more in the reference. In order to always use the best protocol supported by the python version you're using, you can simply use <code>pickle.HIGHEST_PROTOCOL</code>. The following example is taken from the reference: <pre class="prettyprint"><code>import pickle # ... with open('data.pickle', 'wb') as f: # Pickle the 'data' dictionary using the highest protocol available. pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) </code></pre>

Fastest way to save and load a large dictionary in Python

Tags:

I have a relatively large dictionary. How do I know the size? well when I save it using cPickle the size of the file will grow approx. 400Mb. cPickle is supposed to be much faster than pickle but loading and saving this file just takes a lot of time. I have a Dual Core laptop 2.6 Ghz with 4GB RAM on a Linux machine. Does anyone have any suggestions for a faster saving and loading of dictionaries in python? thanks

784

asked Mar 09 '11 16:03

Hossein

2 Answers

Use the protocol=2 option of cPickle. The default protocol (0) is much slower, and produces much larger files on disk.

The advantages of a database like sqlite over cPickle will depend on your use case. How often will you write data? How many times do you expect to read each datum that you write? Will you ever want to perform a search of the data you write, or load it one piece at a time?

If you're doing write-once, read-many, and loading one piece at a time, by all means use a database. If you're doing write once, read once, cPickle (with any protocol other than the default protocol=0) will be hard to beat. If you just want a large, persistent dict, use shelve.

190

answered Nov 06 '22 06:11

Andrew

I know it's an old question but just as an update for those who still looking for an answer to this question: The protocol argument has been updated in python 3 and now there are even faster and more efficient options (i.e. protocol=3 and protocol=4) which might not work under python 2. You can read about it more in the reference.

In order to always use the best protocol supported by the python version you're using, you can simply use pickle.HIGHEST_PROTOCOL. The following example is taken from the reference:

import pickle # ... with open('data.pickle', 'wb') as f:     # Pickle the 'data' dictionary using the highest protocol available.     pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

answered Nov 06 '22 07:11

Moran Neuhof

Related questions
                            
                                Is there any trick to "overload the dot operator"?
                            
                                Emacs stock major modes list
                            
                                When SyncAdapter runs synchronization on android?
                            
                                How is pjax working?
                            
                                Does it make sense to define a struct with a reference type member?
                            
                                Does TortoiseGit actually make Git a lot easier to use like TortoiseSVN? [closed]
                            
                                mozilla's bind function question
                            
                                Differential Equations in Python [closed]
                            
                                Does explicit template instantiation go in cpp or header file?
                            
                                Svchost: How to make it run only one module per instance of svchost.exe? How to get extended process info?
                            
                                Handling Zip Files Without Third Party Lib in .NET 4.0?
                            
                                Dynamically create jQuery Mobile page via JavaScript after clicking

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With