Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast, low-memory, constant key-value database supporting concurrent and random access reads

I need an on-disk key-value store, not too big or distributed. The use case is as follows:

  • The full DB will be few Gbs in size
  • Both key and value are of constant size
  • Its a constant data base. Once the entire database is written I don't need to write any more entries (or write very infrequently)
  • Keys will be accessed in unpredictable order
  • Supporting concurrent reads by multiple processes is a must.
  • Have to be very fast because the readers will be accessing millions of keys in a tight loop. So it should be as close as possible to being as performant as looping over an associative array (STL's std::map say)
  • Ideally it should allow one to set how much RAM to use, typically it should use a few hundreds of Mbs
  • Written in C or C++. An existing python extension will be a big plus, but iI can add that on my own

So cdb and gdbm look like good choices, but just wanted to know if there are more suitable choices. Pointers to relevant benchmarks or even relevant anecdotal evidence will be appreciated.

like image 719
san Avatar asked Oct 08 '22 11:10

san


1 Answers

What database did you end up using?

If you like cdb and you need > 4 GB database, please have a look at mcdb, which is originally based on cdb, plus some performance enhancements and the addition of support for 4 GB+ constant databases.

https://github.com/gstrauss/mcdb/

Python, Perl, Lua, and Ruby extensions are provided. mcdb is written in C and uses mmap under the hood and so easily supports lock-free concurrent reads between threads and between processes. Since it is backed by a memory-mapped file, pages are mapped in from disk as needed and memory is effectively constant even as the number of processes accessing the database increases.

like image 170
gstrauss Avatar answered Oct 10 '22 22:10

gstrauss