Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustering, Sharding or simple Partition / Replication

We have created a Facebook application and it got a lot of virality. The problem is that our database started getting REALLY FULL (some tables have more than 25 million rows now). It got to the point that the app just stopped working because there was a queue of thousands and thousands of writes to be made.

I need to implement a solution for scaling this app QUICKLY but I'm not sure if I should pursue Sharding or Clustering since I'm not sure what are the pro's and con's of each of them and I was thinking of doing a Partition / Replication approach but I think that doesn't help if the load is on the writes?

like image 715
albertosh Avatar asked Jan 04 '11 14:01

albertosh


People also ask

What is the difference between replication partitioning clustering and sharding?

Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.

What is the difference between clustering and partitioning?

According to [2] the main difference between clustering and partitioning is that clustering typically implies a bottom-up cell grouping mechanism that generates a large number of small groups (clusters), while partitioning implies a top-down cell grouping mechanism that results in a small number of large groups (parts) ...

What is the difference between clustering and replication?

Database cluster is just multiple nodes running that database. It's same as usual clusters. Now these nodes are not necessarily replicas, which means that the data stored by each node might be difference from other nodes. And this is what the difference between a general cluster and a replica-set is.

What is the difference between replication and partitioning?

Replication: Keep a copy of the same data on several different nodes. Partitioning: Split the database into smaller subsets and distributed the partitions to different nodes. Transactions: Mechanisms to ensure that data is kept consistent in the database.


2 Answers

Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand. This is the first lesson I learnt when I started being hit by such issues

like image 192
Abdel Hegazi Avatar answered Sep 20 '22 17:09

Abdel Hegazi


Well, to understand that, you need to understand how MySQL handles clustering. There are 2 main ways to do it. You can either do Master-Master replication, or NDB (Network Database) clustering.

Master-Master replication won't help with write loads, since both masters need to replay every single write issued (so you're not gaining anything).

NDB clustering will work very well for you if and only if you are doing mostly primary key lookups (since only with PK lookups can NDB operate more efficient than a regular master-master setup). All data is automatically partitioned among many servers. Like I said, I would only consider this if the vast majority of your queries are nothing more than PK lookups.


So that leaves two more options. Sharding and moving away from MySQL.

Sharding is a good option for handling a situation like this. However, to take full advantage of sharding, the application needs to be fully aware of it. So you would need to go back and rewrite all the database accessing code to pick the right server to talk to for each query. And depending on how your system is currently setup, it may not be possible to effectively shard...

But another option which I think may suit your needs best is switching away from MySQL. Since you're going to need to rewrite your DB access code anyway, it shouldn't be too hard to switch to a NoSQL database (again, depending on your current setup). There are tons of NoSQL servers out there, but I like MongoDB. It should be able to withstand your write load without worry. Just beware that you really need a 64 bit server to use it properly (with your data volume).

like image 34
ircmaxell Avatar answered Sep 20 '22 17:09

ircmaxell