Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cdb - constant key-value store for large files (hundreds of GB)

I need a tool similar to cdb (constant database) that would allow me to store large sets of data (in the range of hundreds of gigabytes) in indexed files. CDB is an ideal candidate but it has a 2 GB file size limit so it's not suitable. The functionality I'm looking for is a persistent key-value store supporting binary keys and values. After creating the database is read only and will never be modified. Can you recommend some tool? And btw, storage overhead should be small because I will be storing billion of records.

BTW I'm looking for a db management library (embeddable), not a standalone server. Something that can be used inside a C program.

Thanks, RG

like image 644
nightwatch Avatar asked Mar 15 '12 16:03

nightwatch


2 Answers

Another option is mcdb, which is extended from Dan J. Bernstein's cdb.

https://github.com/gstrauss/mcdb/

mcdb supports very large constant databases and is faster than cdb, both for database creation and database access. Still, creating a database of hundreds of gigabytes can take a bit of time. mcdb can create a gigabyte-sized database in a few seconds for cached data or in a minute or so when starting from cold cache.

https://github.com/gstrauss/mcdb/blob/master/t/PERFORMANCE

(Disclosure: I am the author of mcdb)

like image 142
gstrauss Avatar answered Sep 30 '22 06:09

gstrauss


There's hamsterdb (i'm the author), berkeleydb, tokyo cabinet.

hamsterdb uses a btree and therefore sorts your data. tokyo cabinet is a hash table and therefore not sorted. berkeleydb can do both.

Needless to say what I would recommend ;)

All of them can be linked into a C application. None of them should have a 2GB limit.

bye Christoph

like image 24
cruppstahl Avatar answered Sep 30 '22 07:09

cruppstahl