Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis replication and redis sharding (cluster) difference

Tags:

redis

  1. Anyone know the difference between redis replication and redis sharding?
  2. What are they use for? Redis stores data in memory, how does this affect replication/sharding?
  3. Is it possible to use both of them together?
like image 296
Patrick Avatar asked Jan 26 '10 12:01

Patrick


People also ask

What is the difference between Redis and Redis cluster?

The Redis Cluster supports only one database - indicated if you have a big dataset - and Redis supports multiple databases. The Redis Cluster client must support redirection, while the client used for Redis doesn't need it.

What is Redis sharding?

In Redis, data sharding (partitioning) is the technique to split all data across multiple Redis instances so that every instance will only contain a subset of the keys. Such a process allows mitigating data grown by adding more and more instances and dividing the data to smaller parts (shards or partitions).

What is Redis replication?

Redis replication is the process that allows Redis instances to be exact copies of master instances. Replication by default is an asynchronous process. Redis replication is non-blocking on the master side as well as on the replica side.

Does Redis use sharding?

Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call a hash slot. There are 16384 hash slots in Redis Cluster, and to compute the hash slot for a given key, we simply take the CRC16 of the key modulo 16384.


2 Answers

Sharding is almost replication's antithesis, though they are orthogonal concepts and work well together.

Sharding, also known as partitioning, is splitting the data up by key; While replication, also known as mirroring, is to copy all data.

Sharding is useful to increase performance, reducing the hit and memory load on any one resource. Replication is useful for getting a high availability of reads. If you read from multiple replicas, you will also reduce the hit rate on all resources, but the memory requirement for all resources remains the same. It should be noted that, while you can write to a slave, replication is master->slave only. So you cannot scale writes this way.

Suppose you have the following tuples: [1:Apple], [2:Banana], [3:Cherry], [4:Durian] and we have two machines A and B. With Sharding, we might store keys 2,4 on machine A; and keys 1,3 on machine B. With Replication, we store keys 1,2,3,4 on machine A and 1,2,3,4 on machine B.

Sharding is typically implemented by performing a consistent hash upon the key. The above example was implemented with the following hash function h(x){return x%2==0?A:B}.

To combine the concepts, We might replicate each shard. In the above cases, all of the data (2,4) of machine A could be replicated on machine C and all of the data (1,3) of machine B could be replicated on machine D.

Any key-value store (of which Redis is only one example) supports sharding, though certain cross-key functions will no longer work. Redis supports replication out of the box.

like image 189
Alex Avatar answered Sep 28 '22 04:09

Alex


In simple words, the fundamental difference between the two concepts is that Sharding is used to scale Writes while Replication is used to scale Reads. As Alex already mentioned, Replication is also one of the solutions to achieve HA.

Yes, they are both typically used together if you consider how shards can be replicated across nodes in a cluster.

With regard to your third question, instead of the RAM-flush option, it is a better idea to use the Redis Append Only File (AOF). At only a minor cost (in terms of write speed), you get a lot more reliability of your writes. It is quite like the mysql binary log. The 1 fsync/second is the recommended option to use.

like image 22
gshx Avatar answered Sep 28 '22 05:09

gshx