Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can denormalization be attribute of NoSQL DB

While discussing NoSQL DBs against traditional RDBMS, many articles say that, in NoSQL-DB all related data is kept together so joins are avoided. Thus retrieving data is faster. In short data is denormalized. There are downsides of denormalization as well. e.g. redundancy, extra space, need to update data at multiple places etc.

But irrespective of Pros-and-cons of denormalization; it is a DB design attribute. How can it be attributed to particular DB-type ? If in a given case, it is ok to denormalize data then same can be achieved in RDBMs also.

So why is denormalization discussed as attribute of NoSQL db ?

like image 261
Kaushik Lele Avatar asked Jun 06 '15 06:06

Kaushik Lele


2 Answers

You seem to be reading hype, instead of database design articles. You can denormalize any database. Yes, NoSQL is for cases where denormalized data is a good thing, for instance, in storing documents, where subdocuments are used instead of joins to another table. This works best when the subdocuments are not duplicated. Of course, if they are duplicated, then you have the usual problems of denormalized data.

Example: Person uses Car. In a relational database, you would have a Persons table and a Cars table and a junction table, perhaps "CarsUsedByPerson". In a NoSQL system, you might have a "car" document embedded within a "person" document.

Of course, if two people use the same car, then you have the same data in multiple places, and you'll need to update it in all such places, or it will be inconsistent.

NoSQL is for cases where you need the performance more than you need the consistency.

like image 147
John Saunders Avatar answered Nov 15 '22 08:11

John Saunders


Seconding John Saunders that you can denormalize data in an RDBMS as well - denormalization is an attribute of most NoSQL databases ("most" meaning "excluding graph databases") because in many cases you MUST denormalize in order to get decent performance.

Continuing with his example, let's say that I've got a Person record, which has a foreign key to a Car record (one car per person in this example to simplify matters), which has a foreign key to a Manufacturer record. For a given person I want the record for that person, for their car, and for their car's manufacturer.

In an RDBMS I can normalize this data and retrieve it all in one query using a join, or I can denormalize this data - the denormalized read is going to be a bit cheaper than the normalized read because joins aren't free, but in this case the difference in read performance probably won't be significant.

My NoSQL database probably doesn't support joins so if I normalize this data then I'll have to do three separate lookups for it, e.g. using a key-value database I'd first retrieve the Person which contains a Car key, then I'd retrieve the Car which contains a Manufacturer key, then I'd retrieve the Manufacturer; if this data were denormalized then I'd only need one lookup, so the performance improvement would be significant. In the rare case that the NoSQL database does support joins then it is almost certainly location agnostic, so the Person, Car, and Manufacturer records might be on different servers or even in different data centers making for a very expensive join.

So an overly simplistic breakdown of your options are:

  1. Traditional RDBMS, good with normalized data but difficult to scale out
  2. NoSQL database, relatively easy to scale out but a bit crap with normalized data
  3. Distributed OLAP database (e.g. Aster, Greenplum), relatively easy to scale out and good with normalized data but very expensive
like image 23
Zim-Zam O'Pootertoot Avatar answered Nov 15 '22 07:11

Zim-Zam O'Pootertoot