I am looking to write a Key/value store (probably in python) mostly just for experience, and because it's something I think that is a very useful product. I have a couple of questions. How, in general, are key/value pairs normally stored in memory and on disk? How would one go about loading the things stored on disk, back into memory? Do key/value stores keep all the key/value pairs in memory at once? or is it read from the disk?
I tried to find some literature on the subject, but didn't get very far and was hoping someone here could help me out.
It all depends on the level of complexity you want to dive into. Starting with a simple Python dict
serialized to a file in a myriad of possible ways (of which pickle is probably the simplest), you can go as far as implementing a complete database system.
Look up redis
- it's a key/value store written in C and operating as a server "DB". It has some good documentation and easy to read code, so you can borrow ideas for your Python implementation.
To go even farther, you can read about B-trees.
For your specific questions: above some DB size, you can never keep it all in memory, so you need some robust way of loading data from disk. Also consider whether the store is single-client or multi-client. This has serious consequences for its implementation.
Have a look at Python's shelve
module which provides a persitent dictionary. It basically stores pickles in a database, usually dmb or BSDDB. Looking at how shelve
works will give you some insights, and the source code comes with your python distribution.
Another product to look at is Durus. This is an object database, that it uses it's own B-tree implementation for persistence to disk.
If you're doing a key/value store in Python for learning purposes, it might be easiest to start with the pickle module. It's a fast and convenient way to write an arbitrary Python data stream to a persistent store and read it back again.
you can have a look at 'Berkley db' to see how how it works, it is a key/value DB, so you can use it directly, or as it is open-source see how it handles persistence, transactions and paging of most referred pages.
here are python bindings to it http://www.jcea.es/programacion/pybsddb.htm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With