Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Mongo what is the difference between sharding and replication?

Tags:

mongodb

Replication seems to be a lot simpler than sharding, unless I am missing the benefits of what sharding is actually trying to achieve. Don't they both provide horizontal scaling?

like image 494
nickponline Avatar asked Jul 20 '12 00:07

nickponline


People also ask

What is replication and sharding?

Replication can be simply understood as the duplication of the data-set whereas sharding is partitioning the data-set into discrete parts. By sharding, you divided your collection into different parts. Replicating your database means you make imagers of your data-set.

Which is good replication or sharding?

Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest. sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key.

What is sharding in MongoDB?

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server.

What is replication in MongoDB?

Replication in MongoDB A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.


1 Answers

In the context of scaling MongoDB:

  • replication creates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest.

  • sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key. For example, a poor choice of shard key could lead to "hot spots" of data only being written on a single shard.

A sharded environment does add more complexity because MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.

Replication and sharding are typically combined to created a sharded cluster where each shard is supported by a replica set.

From a client application point of view you also have some control in relation to the replication/sharding interaction, in particular:

  • Read preferences
  • Write concerns
like image 134
Stennie Avatar answered Sep 29 '22 23:09

Stennie