Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Relational database cannot be scaled horizontally

I know this question has been asked quite a few times, but I have not got any satisfying answer.

I have read many blogs and most of them say that RDBMS cannot be scaled horizontally. The only way to deal with it is by buying bigger machines.

Then I read why they can't be scaled horizontally. People say because they provide solid, mature services according to the ACID properties. My argument to that is can't we drop an RDBMS to provide ACID properties for specific tables. Is that the only reason that it can't be scaled horizontally and we have to consider NoSQL databases.

The second argument that is put up is that NoSQL databases store data as a single unit whereas RDBMS stores data across multiple tables. Thus one piece of data may be in one system and another piece of data which it is referring may be in another system. Hence scaling RDBMS distributedly becomes difficult. My question to them is why can't we store all the related data in a single table rather scattering it across multiple tables if the situation demands. If NoSQL can store data as a single unit in a single collection, why can't RDBMS store data as a single unit in a single table. (For eg, why an order has to be split into order table, customer table and payment table. Why can't they be clubbed into a single table, the way a NoSQL would have stored)

This also allows developers to develop without having to convert in-memory structures to relational structures.

In short, can we make an RDBMS behave like a NoSQL database and make it scale horizontally?

like image 324
Aniket Avatar asked Mar 14 '26 16:03

Aniket


1 Answers

First - what do you mean by 'scaling horizontally'?

To me - scaling horizontally is what we all do in MPP (Massive Parallel Processing) databases - like Vertica, Teradata, DB2 Parallel Edition, NonStop SQL, etc.: You have a very big table, which you distribute evenly across all nodes of your MPP cluster, based, usually, on the hash value of the primary key, or something similar. This is what Hadoop and all other Map-Reduce architecture does, too (while often being less effective, at least as of now).

(just editing to clarify): If you have 10 nodes in your cluster, your big tables all are distributed to have one tenth of their data on each node. Scaling, now, would be to add, for example, 10 nodes, and re-distribute the data so that each table has 1/20 of its data on each node. And MPP databases scale linearly; this means that by doubling the number of nodes, with the same data volume, the queries will now run twice as quick.

You seem to mean something different - and I'm curious on what you might mean.

As to RDBMS having to split everything into several tables:

The 'R' in RDBMS stands for 'Relational'. Before entering a discussion of all this, you should read a basic tutorial on relational algebra. A Relation simply is a set of objects that can all be described with the same attributes. With that, all objects have the same attributes/colums/fields. As soon as this rule is violated, it is not a relation / table anymore.

I strongly suggest that you take a training on relational theory and relational databases, even before starting to play with SQL.

It's a big, big world of its own that you will have the opportunity to explore. And it all boils down to set theory and Boolean and relational algebra. And you can do so many things with it ...

Your question here is just like asking why a bicycle has two wheels.

Or am I missing something?

Marco the Sane

like image 102
marcothesane Avatar answered Mar 17 '26 10:03

marcothesane



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!