HBase has a master-slave model, while Cassandra has a peer-to-peer model. I am aware that in a master-slave model, the master is a SPOF (Single Point of Failure) and there is no such thing in a peer-to-peer model.
Are there any other pros and cons of each model? Specially I am looking for any advantages of master-slave over the peer-to-peer model.
One side point is that the master is not a SPOF in HBase, as you can have a Multi-Master configuration. http://wiki.apache.org/hadoop/Hbase/MultipleMasters
Having the masters makes it a little easier to know where the data is and where it is going. It's also based on Hadoop, so the integration with Map Reduce is quite nice (where a Map job will naturally split out to the region servers and give you a row). I think this is the main plus.
Cassandra's primary "con" is the eventual consistency model, although it allows you to choose consistency models.
One comparison point is that data in HBase is sorted by key, where it is random in Cassandra. This can provide some benefits with smart keys in HBase, although you can always choose a GUID or random key to emulate Cassandra's behavior. Cassandra can partition non-randomly, but HBase is still better for range scans.
I've used both, and they both work, and both take a lot of work to keep working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With