Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Cassandra suitable to use as a primary data store?

I'm evaluating a storage platform for an upcoming project and keep coming back to Cassandra. For this project loosing any amount of data is unacceptable. So far we've used a relational database (Microsoft SQL Server), but the data is so varied and large that it has become an issue to store and query.

Is Cassandra robust enough to use as a primary data store? Or should it only be used to mirror existing data to speed up access?

like image 845
John Clayton Avatar asked Dec 04 '09 19:12

John Clayton


2 Answers

Anecdotally: yes, Twitter, Digg, Ooyala, SimpleGeo, Mahalo, and others are using or moving to Cassandra for a primary data store (http://n2.nabble.com/Cassandra-users-survey-td4040068.html).

Technically: yes; besides supporting replication (including to multiple datacenters), each Cassandra node has an fsync'd commit log to make sure writes are durable; from there writes are turned into SSTables which are immutable until compaction (which combines multiple SSTables to GC old versions). Snapshotting is supported at any time, including automatic snapshot-before-compaction.

like image 170
jbellis Avatar answered Jan 04 '23 00:01

jbellis


Whether to use Cassandra for your application or not depends purely on your data workloads. Cassandra is optimised for write-intensive workloads, therefore, it is suitable for applications where a large amount of data needs to be inserted (such as infrastructure logging information at Facebook).

If however, you require fast retrievals and insertion speed is not an issue, then perhaps you should have a look at say HBase (which is optimised of read-intensive workloads).

like image 36
Irfan Avatar answered Jan 04 '23 01:01

Irfan