I am trying to figure out if I am on the right track. I am building a (real-time) statistics/analytics service and I use redis to store some sets and hashes. Now let's assume I have some success and I need to scale out. The hash ring technique looks nice, but I have an impression that it is only suited for caching scenarios. What if a node goes down? In theory, its keys are now owned by other nodes. In practice, they won't have the data. It is lost, right? Same with adding / removing nodes. Am I missing some fundamental thing? Can this be a poor man's cluster?

There are two reasons to use multiple nodes in a cluster: <ul> <li>Sharding to limit the amount of data stored on each node</li> <li>Duplication to reduce read load and allow a node to be removed without data loss.</li> </ul> The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node. If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data. My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1. When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.

Consistent hashing as a way to scale writes

1 Answers

There are two reasons to use multiple nodes in a cluster:

Sharding to limit the amount of data stored on each node
Duplication to reduce read load and allow a node to be removed without data loss.

The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node.

If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data.

My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1.

When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.

181

answered Sep 21 '22 01:09

Tom Clarkson

Related questions
                            
                                Convert SHA Hash Computation in Python to C#
                            
                                Lookup table for unhashable in Python
                            
                                substr md5 collision
                            
                                Produce MD5 or SHA1 hash code to long (64 bits)
                            
                                Universal hashing
                            
                                How to calculate the time needed to calculate a SHA-256 hash?
                            
                                Using a more secure hashing algorithm with CakePHP
                            
                                Subtract values in hash from corresponding values in another hash
                            
                                How to hash an unordered_map?
                            
                                Ruby Hash with integer keys changed to string keys
                            
                                Cryptographic hash functions in Python
                            
                                Fetch subkeys from a hash in redis
                            
                                Authentication with hashing
                            
                                Replace MULTIPLE first keys of multi-dimensional Hash in perl
                            
                                Using Python to authenticate against raw username, hash, salt in DB created by ASP.NET roles/membership
                            
                                Good hash algorithm for list of (memory) addresses
                            
                                Difference between a keyed hash and a non-keyed hash?
                            
                                understanding of hash code
                            
                                Rails: How to sort/re-order an OrderedHash
                            
                                CRC32+Size vs MD5/SHA1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Consistent hashing as a way to scale writes

Tags:

redis

hash

consistent-hashing

Sergio Tulentsev

People also ask

1 Answers

Tom Clarkson

Recent Activity

Donate For Us