Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does consistent hashing work?

I am trying to understand how consistent hashing works. This is the article which I am trying to follow but not able to follow, to start with my questions are:

  1. I understand, servers are mapped into ranges of hashcodes and the data distribution is more fixed and look becomes easy. But how does this deal with the problem a new node is added in the cluster?

  2. The sample java code is not working, any suggestion of a simple java based consistent hashing.

Update

  1. Any alternatives to consistent hashing?
like image 722
zengr Avatar asked Sep 29 '10 02:09

zengr


2 Answers

I will answer the first part of your question. First of all, there are some errors in that code, so I would look for a better example.

Using a cache server as the example here.

When you think about consistent hashing, you should think of it as a circular ring, as the article you linked to does. When a new server is added, it will have no data on it to start with. When a client fetches data that should be on that server and does not find it, a cache-miss will occurs. The program should then fill in the data on the new node, so future requests will be a cache-hit. And that is about it, from a caching point of view.

like image 140
staackuser2 Avatar answered Sep 20 '22 15:09

staackuser2


For python implementation Refer my github repo

Simplest Explanation What is normal hashing ?

Let's say we have to store the following key value pair in a distributed memory store like redis.

enter image description here

Let say we have a hash function f(id) ,that takes above ids and creates hashes from it . Assume we have 3 servers - (s1 , s2 and s3)

We can do a modulo of hash by the no of servers ie 3 , to map each each key to a server and we are left with following.

enter image description here

We could retrieve the value for a key by simple lookup using f(). Say for key Jackson , f("Jackson")%(no of servers) => 1211*3 = 2 (node-2).

This looks perfecto , yea close but not cigar !

But What if a server say node-1 went down ? Applying the same formula ie f(id)%(no of servers) for user Jackson, 1211%2 = 1 ie we got node-1 when the actual key is hashed to node-2 from the above table .

We could do remapping here , What if we have a billion keys ,in that case we have to remap a large no of keys which is tedious :(

This is a major flow in traditional hashing technique.

What is Consistent Hashing ?

In Consistent hashing , we visualize list of all nodes in a circular ring .(Basically a sorted array)

ConsistantHashingRing

start func
For each node:
 Find f(node) where f is the hash function
 Append each f(node) to a sorted array
For any key
  Compute the hash f(key)
  Find the first f(node)>f(key)
  map it
end func

For example, if we have to hash key smith, we compute the hash value 1123 , find the immediate node having hash value > 1123 ie node 3 with hash value 1500

Now , What if we loose a server , say we loose node-2 , All the keys can be mapped to next server node-3 :) Yea , we only have to remap the keys of node-2

enter image description here

like image 29
melvil james Avatar answered Sep 20 '22 15:09

melvil james