Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is sharding same as distributed database in mongoDB? [closed]

I want to implement mongodb as a distributed database but i cannot find good tutorials for it. Whenever i searched for distributed database in mongodb, it gives me links of sharding, so i am confused if both of them are the same things?

like image 348
sanchit kashyap Avatar asked Jun 12 '15 05:06

sanchit kashyap


People also ask

What is meant by sharding in MongoDB?

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server.

Is sharding distributed systems?

Sharding is a mechanism used for splitting and storing a large amount of data into several smaller datasets. In distributed data-storage system, sharding distributes the data across multiple servers, as a result of which each server acts as the source for only a subset of data.

Is sharding same as horizontal scaling?

Horizontal scaling (aka sharding) is when you actually split your data into smaller, independent buckets and keep adding new buckets as needed. A sharded environment has two or more groups of MySQL servers that are isolated and independent from each other.

What is the difference between sharding and replication in MongoDB?

What is the difference between replication and sharding? Replication: The primary server node copies data onto secondary server nodes. This can help increase data availability and act as a backup, in case if the primary server fails. Sharding: Handles horizontal scaling across servers using a shard key.


2 Answers

Generally speaking, if you got a read-heavy system, you may want to use replication. Which is 1 primary with at most 50 secondaries. The secondaries share the read stress while the primary takes care of writes. It is a auto-failover system so that when the primary is down, one of the secondaries would take the job there and becomes a new primary.

Sharding, however, is more flexible. All the Shards share write stress and read stress. That is to say, data are distributed into different Shards. And each shard can be consists of a Replication system and auto-failover works as described above.

I would choose replication first because it's simple and is basically enough for most scenarios. And once it's not enough, you can choose to convert from replication to sharding.

There is also another discussion of differences between replication and sharding for your reference.

like image 200
yaoxing Avatar answered Sep 27 '22 23:09

yaoxing


Just some perspective on distributed databases:

In early nineties a lot of applications were desktop based and had a local database which contained MB/GBs of data.

Now with the advent of web based applications there can be millions of users who use and store their data, this data can run into GB/TB/PB. Storing all this data on a single server is economically expensive so there is a cluster of servers(or commodity hardware) across which data is horizontally partitioned. Sharding is another term for horizontal partitioning of data. For example you have a Customer table which contains 100 rows, you want to shard it across 4 servers, you can pick 'key' based sharding in which customers will be distributed as follows: SHARD-1(1-25),SHARD-2(26-50),SHARD-3(51-75),SHARD-4(76-100)

Sharding can be done in 2 ways:

  1. Hash based

  2. Key based

like image 38
vmr Avatar answered Sep 27 '22 22:09

vmr