I want to implement mongodb as a distributed database but i cannot find good tutorials for it. Whenever i searched for distributed database in mongodb, it gives me links of sharding, so i am confused if both of them are the same things?
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server.
Sharding is a mechanism used for splitting and storing a large amount of data into several smaller datasets. In distributed data-storage system, sharding distributes the data across multiple servers, as a result of which each server acts as the source for only a subset of data.
Horizontal scaling (aka sharding) is when you actually split your data into smaller, independent buckets and keep adding new buckets as needed. A sharded environment has two or more groups of MySQL servers that are isolated and independent from each other.
What is the difference between replication and sharding? Replication: The primary server node copies data onto secondary server nodes. This can help increase data availability and act as a backup, in case if the primary server fails. Sharding: Handles horizontal scaling across servers using a shard key.
Generally speaking, if you got a read-heavy system, you may want to use replication. Which is 1 primary
with at most 50 secondaries
. The secondaries
share the read stress while the primary
takes care of writes. It is a auto-failover system so that when the primary
is down, one of the secondaries
would take the job there and becomes a new primary
.
Sharding, however, is more flexible. All the Shards
share write stress and read stress. That is to say, data are distributed into different Shards
. And each shard can be consists of a Replication
system and auto-failover works as described above.
I would choose replication
first because it's simple and is basically enough for most scenarios. And once it's not enough, you can choose to convert from replication to sharding.
There is also another discussion of differences between replication and sharding for your reference.
Just some perspective on distributed databases:
In early nineties a lot of applications were desktop based and had a local database which contained MB/GBs of data.
Now with the advent of web based applications there can be millions of users who use and store their data, this data can run into GB/TB/PB. Storing all this data on a single server is economically expensive so there is a cluster of servers(or commodity hardware) across which data is horizontally partitioned. Sharding is another term for horizontal partitioning of data. For example you have a Customer table which contains 100 rows, you want to shard it across 4 servers, you can pick 'key' based sharding in which customers will be distributed as follows: SHARD-1(1-25),SHARD-2(26-50),SHARD-3(51-75),SHARD-4(76-100)
Sharding can be done in 2 ways:
Hash based
Key based
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With