Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does MongoDB do both sharding and replication at the same time?

For scaling/failover mongodb uses a “replica set” where there is a primary and one or more secondary servers. Primary is used for writes. Secondaries are used for reads. This is pretty much master slave pattern used in SQL programming. If the primary goes down a secondary in the cluster of secondaries takes its place. So the issue of horizontally scaling and failover is taken care of. However, this is not a solution which allows for sharding it seems. A true shard holds only a portion of the entire data, so if the secondary in a replica set is shard how can it qualify as primary when it doesn’t have all of the data needed to service the requests ?

Wouldn't we have to have a replica set for each one of the shards?

This obviously a beginner question so a link that visually or otherwise illustrates how this is done would be helpful.

like image 630
alex sundukovskiy Avatar asked Feb 06 '13 21:02

alex sundukovskiy


People also ask

Can sharding and replication be used together?

Replication and sharding can work together to form something called a sharded cluster, where each shard is replicated in turn to preserve the same high availability.

Is sharding and replication same?

Replication can be simply understood as the duplication of the data-set whereas sharding is partitioning the data-set into discrete parts. By sharding, you divided your collection into different parts. Replicating your database means you make imagers of your data-set.

Why there is a need of MongoDB discuss basic concept of sharding and replication in MongoDB?

MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server.

What is replica set and sharding?

A replica set keeps the data in sync across several different instances so that if one of them goes down, we won't lose any data. Logically, each replica set can be seen as a shard. It's transparent to the application, the way MongoDB chooses to shard is we choose a shard key.


2 Answers

Your assumption is correct, each shard contains a separate replica set. When a write request comes in, MongoS finds the right shard for it based on the shard key, and the data is written to the Primary of the replica set contained in that shard. This results in write scaling, as a (well chosen) shard key should distribute writes over all your shards.

like image 66
Alptigin Jalayr Avatar answered Oct 02 '22 13:10

Alptigin Jalayr


A shard is the sum of a primary and secondaries (replica set), so yes, you would have to have a replica set in each shard.

The portion of the entire data is held in the primary and it's shared with the secondaries to maintain consistency. If the primary goes out, a secondary is elected to be the new primary and has the same data as its predecessor to begin serving immediately. That means that the sharded data is still present and not lost.

like image 28
Alderis Shyti Avatar answered Oct 02 '22 12:10

Alderis Shyti