Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

voldemort vs. couchdb

I am trying to decide whether to use voldemort or couchdb for an upcoming healthcare project. I want a storage system that has high availability , fault tolerance, and can scale for the massive amounts of data being thrown at it.

What is the pros/cons of each?

Thanks

like image 325
py213py Avatar asked Mar 01 '09 23:03

py213py


3 Answers

Project Voldemort looks nice, but I haven't looked deeply into it so far.

In it current state CouchDB might not be the right thing for "massive amounts of data". Distributing data between nodes and routing queries accordingly is on the roadmap but not implemented so far. The biggest known production setups of CouchDB use "tables" ("databases" in couch-speak) of about 200G.

HA is not natively supported by CouchDB but can build easily: All CouchDB nodes are replicating the database nodes between each other in a multi-master setup. We put two Varnish proxies in front of the CouchDB machines and the Varnish boxes are made redundant with CARP. CouchDBs "build from the Web" design makes such things very easy.

The most pressing issue in our setup is the fact that there are still issues with the replication of large (multi MB) attachments to CouchDB documents.

I suggest you also check the traditional RDBMS route. There are huge issues with available talent outside the RDBMS approach and there are very capable offerings available from Oracle & Co.

like image 115
max Avatar answered Nov 08 '22 09:11

max


Not knowing enough from your question, I would nevertheless say Project Voldemort or distributed hash tables (DHTs) like CouchDB in general are a solution to your problem of HA.

Those DHTs are very nice for high availability but harder to write code for than traditional relational databases (RDBMS) concerning consistency.

They are quite good to store document type information, which may fit nicely with your healthcare project but make development harder for data.

  • The biggest limitation of most stores is that they are not transactionally safe (See Scalaris for an transactionally safe store) and you need to ensure data consistency by yourself - most use read time consistency by merging conflicting data). RDBMS are much easier to use for consistency of data (ACID)
  • Joining data is much harder too. In RDBMs you can easily query data over several tables, you need to write code in CouchDB to aggregate data. For other stores Hadoop may be a good choice for aggregating information.

Read about BASE and the CAP theorem on consistency vs. availability.

See

  • http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
  • http://queue.acm.org/detail.cfm?id=1394128
like image 37
KingOfCoders Avatar answered Nov 08 '22 10:11

KingOfCoders


Is memcacheDB an option? I've heard that's how Digg handled HA issues.

like image 1
scunliffe Avatar answered Nov 08 '22 10:11

scunliffe