For scaling/failover mongodb uses a “replica set” where there is a primary and one or more secondary servers. Primary is used for writes. Secondaries are used for reads. This is pretty much master slave pattern used in SQL programming. If the primary goes down a secondary in the cluster of secondaries takes its place. So the issue of horizontally scaling and failover is taken care of. However, this is not a solution which allows for sharding it seems. A true shard holds only a portion of the entire data, so if the secondary in a replica set is shard how can it qualify as primary when it doesn’t have all of the data needed to service the requests ?
Wouldn't we have to have a replica set for each one of the shards?
This obviously a beginner question so a link that visually or otherwise illustrates how this is done would be helpful.
Replication and sharding can work together to form something called a sharded cluster, where each shard is replicated in turn to preserve the same high availability.
Replication can be simply understood as the duplication of the data-set whereas sharding is partitioning the data-set into discrete parts. By sharding, you divided your collection into different parts. Replicating your database means you make imagers of your data-set.
MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server.
A replica set keeps the data in sync across several different instances so that if one of them goes down, we won't lose any data. Logically, each replica set can be seen as a shard. It's transparent to the application, the way MongoDB chooses to shard is we choose a shard key.
Your assumption is correct, each shard contains a separate replica set. When a write request comes in, MongoS finds the right shard for it based on the shard key, and the data is written to the Primary of the replica set contained in that shard. This results in write scaling, as a (well chosen) shard key should distribute writes over all your shards.
A shard is the sum of a primary and secondaries (replica set), so yes, you would have to have a replica set in each shard.
The portion of the entire data is held in the primary and it's shared with the secondaries to maintain consistency. If the primary goes out, a secondary is elected to be the new primary and has the same data as its predecessor to begin serving immediately. That means that the sharded data is still present and not lost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With