Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

*large* python dictionary with persistence storage for quick look-ups

Tags:

I have a 400 million lines of unique key-value info that I would like to be available for quick look ups in a script. I am wondering what would be a slick way of doing this. I did consider the following but not sure if there is a way to disk map the dictionary and without using a lot of memory except during dictionary creation.

  1. pickled dictionary object : not sure if this is an optimum solution for my problem
  2. NoSQL type dbases : ideally want something which has minimum dependency on third party stuff plus the key-value are simply numbers. If you feel this is still the best option, I would like to hear that too. May be it will convince me.

Please let me know if anything is not clear.

Thanks! -Abhi

like image 742
Abhi Avatar asked Aug 06 '12 23:08

Abhi


2 Answers

If you want to persist a large dictionary, you are basically looking at a database.

Python comes with built in support for sqlite3, which gives you an easy database solution backed by a file on disk.

like image 60
Sam Mussmann Avatar answered Sep 22 '22 09:09

Sam Mussmann


In principle the shelve module does exactly what you want. It provides a persistent dictionary backed by a database file. Keys must be strings, but shelve will take care of pickling/unpickling values. The type of db file can vary, but it can be a Berkeley DB hash, which is an excellent light weight key-value database.

Your data size sounds huge so you must do some testing, but shelve/BDB is probably up to it.

Note: The bsddb module has been deprecated. Possibly shelve will not support BDB hashes in future.

like image 29
mhawke Avatar answered Sep 22 '22 09:09

mhawke