Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes Cassandra (and NoSQL in general) a better solution to an RDBMS?

Tags:

Well, NoSQL is a buzzword right now so I've been looking into it. I'm yet to get my head around ColumnFamilies and SuperColumns, etc... But I have been looking at how the data is mapped.

After reading this article, and others, it seems the data is mapped in a JSON like format.

Users = {     1: {         username: "dave",         password: "blahblah",         dateReged: "1/1/1"     },     2: {         username: "etc",         password: "blahblah",         dateReged: "2/1/1",         comment: "this guy has a comment and dave doesns't"     }, } 

The RDBMS format would be:

Table name: "Users"  id | username | password | dateReged | comment ---+----------+----------+-----------+--------  1 |  dave    | blahblah |  1/1/1    | ---+----------+----------+-----------+--------  2 |  etc     | blahblah |  2/1/1    | this guy has a comment and dave doesn't 

Assuming I understand this correctly and my above examples are right, why would I choose the RDBMS design over the NoSQL design? Personally, I'd much rather work with the JSON structure... Does this mean I should choose NoSQL over, say, MySQL?

I guess what I'm asking is "when should I choose NoSQL over RDBMS?"

On a side note, as I've said, I'm still not fully understanding how to go about implementing a Cassandra database. Ie, how do I create the above Users table in a new database? Any tutorials, documentation, etc you could point to would be great. My google'ing hasn't turned up much in terms of 'starting from scratch'...

like image 444
dave Avatar asked Sep 09 '10 01:09

dave


2 Answers

If you are google, then you might be in a position where a NoSQL would be easier on you than a RDBMS. Since you are not, the many advantages an RDBMS provides you will probably be of some use. Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?

RDBMSes use some pretty deep magic to understand the data it owns, and the data you are asking for, in such a way that it can return that data in the most efficient manner possible. If you didn't ask about some column, the rdbms doesn't waste any effort retrieving it. If you are interested in rows that have fields in common across two tables, (this is a join, btw), the RDBMS doesn't have to check every single pair of rows for matches, or what a NoSQL db usually does is just give you everything and make you do the checking. with a RDBMS, you can usually construct queries that are actually 'about' the data you are using, like "if the date is a tuesday", and if your indexes support it (if you do that query alot then you would add such an index) you can get those rows efficiently.

There is another reason why RDBMSes are nice. Transactions are easy on RDBMSes, but are much harder to get right on NoSQL databases. Supposing you are implementing a blogging engine. Suppose the post title (which appears in the URL) needs to be unique across all posts. In an RDBMS, you can easily be sure that you won't get this wrong accidentally. With a NoSQL database, if it does support some kind of transactional integrity, it's usually at the shard level, anything that could possibly require that kind of integrity must be on the same shard. since any pair of users could possibly be posting at the same moment, then every users' post must be on the same shard to get the same effect. Well, then you don't get any benefit at all from NoSQL.

like image 142
SingleNegationElimination Avatar answered Oct 14 '22 11:10

SingleNegationElimination


The main advantage of NoSQL is horizontal scalability and distributed storage. That means you can have a large number of 'cluster nodes' and write to them in parallel. The cluster will ensure changes are propagated to the other cluster nodes eventually (eventual consistency).

NoSQL is not so much about SQL (the term means "not only SQL"). In fact, some NoSQL products do support a subset of SQL. The reason the data format is different (JSON or list of property / value pairs versus tabular data) is: within relational databases, the number of columns (and column names) is defined in a central place, which doesn't work well with horizontal scalability (you would need to stop all cluster nodes for schema changes). Also, joins are not supported as much because that would break horizontal scalability (data from multiple cluster nodes may need to be read, if the data is distributed).

like image 41
Thomas Mueller Avatar answered Oct 14 '22 13:10

Thomas Mueller