I am trying to understand how consistent hashing works. This is the article which I am trying to follow but not able to follow, to start with my questions are:
I understand, servers are mapped into ranges of hashcodes and the data distribution is more fixed and look becomes easy. But how does this deal with the problem a new node is added in the cluster?
The sample java code is not working, any suggestion of a simple java based consistent hashing.
Update
I will answer the first part of your question. First of all, there are some errors in that code, so I would look for a better example.
Using a cache server as the example here.
When you think about consistent hashing, you should think of it as a circular ring, as the article you linked to does. When a new server is added, it will have no data on it to start with. When a client fetches data that should be on that server and does not find it, a cache-miss will occurs. The program should then fill in the data on the new node, so future requests will be a cache-hit. And that is about it, from a caching point of view.
For python implementation Refer my github repo
Simplest Explanation What is normal hashing ?
Let's say we have to store the following key value pair in a distributed memory store like redis.
Let say we have a hash function f(id) ,that takes above ids and creates hashes from it . Assume we have 3 servers - (s1 , s2 and s3)
We can do a modulo of hash by the no of servers ie 3 , to map each each key to a server and we are left with following.
We could retrieve the value for a key by simple lookup using f(). Say for key Jackson , f("Jackson")%(no of servers) => 1211*3 = 2 (node-2).
This looks perfecto , yea close but not cigar !
But What if a server say node-1 went down ? Applying the same formula ie f(id)%(no of servers) for user Jackson, 1211%2 = 1 ie we got node-1 when the actual key is hashed to node-2 from the above table .
We could do remapping here , What if we have a billion keys ,in that case we have to remap a large no of keys which is tedious :(
This is a major flow in traditional hashing technique.
What is Consistent Hashing ?
In Consistent hashing , we visualize list of all nodes in a circular ring .(Basically a sorted array)
start func
For each node:
Find f(node) where f is the hash function
Append each f(node) to a sorted array
For any key
Compute the hash f(key)
Find the first f(node)>f(key)
map it
end func
For example, if we have to hash key smith, we compute the hash value 1123 , find the immediate node having hash value > 1123 ie node 3 with hash value 1500
Now , What if we loose a server , say we loose node-2 , All the keys can be mapped to next server node-3 :) Yea , we only have to remap the keys of node-2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With