Target application is a medium-sized website built to support several hundred to several thousand users an hour, with an option to scale above that. Data model is rather simple, and caching potential is pretty high (~10:1 ratio of read to edit actions).
What should be the considerations when coming to choose between a relational, SQL-based datastore to a NoSQL option (such as HBase and Cassandra)?
If your data is very structured and ACID compliance is a must, SQL is a great choice. On the other hand, if your data requirements aren't clear or if your data is unstructured, NoSQL may be your best bet. The data you store in a NoSQL database does not need a predefined schema like you do for a SQL database.
The biggest difference between NoSQL systems lies in the ability to query data efficiently. Document databases provide the richest query functionality, which allows them to address a wide variety of applications. Key-value stores and wide column stores provide a single means of accessing data: by primary key.
NoSQL doesn't support relations between data types. Running queries in NoSQL is doable, but much slower. You have a high transaction application. SQL databases are a better fit for heavy duty or complex transactions because it's more stable and ensure data integrity.
To me, you don't have any particular problem to solve. If you need ACIDity, use a database; if you don't, then it doesn't matter. At the end just build your app. And let me quote NoSQL: If Only It Was That Easy:
The real thing to point out is that if you are being held back from making something super awesome because you can’t choose a database, you are doing it wrong. If you know mysql, just used it. Optimize when you actually need to. Use it like a k/v store, use it like a rdbms, but for god sake, build your killer app! None of this will matter to most apps. Facebook still uses MySQL, a lot. Wikipedia uses MySQL, a lot. FriendFeed uses MySQL, a lot. NoSQL is a great tool, but it’s certainly not going to be your competitive edge, it’s not going to make your app hot, and most of all, your users won’t give a shit about any of this.
Digg have some interesting articles on this question. Essentially, you're shifting the burden of processing to writes rather than reads, which may be desirable in highly scalable applications. Cassandra specifically is also highly available.
Simplistically, Cassandra is a distributed database with a BigTable data model running on a Dynamo like infrastructure. It is column-oriented and allows for the storage of relatively structured data. It has a fully decentralized model; every node is identical and there is no single point of failure. It's also extremely fault tolerant; data is replicated to multiple nodes and across data centers. Cassandra is also very elastic; read and write throughput increase linearly as new machines are added.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With