Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I use Guava's Hashing#consistentHash?

Tags:

I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int) method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash() for simple session affinity to efficiently distribute load across a set of backend servers.

Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.

For example:

@Test public void testConsistentHash() {     List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");      int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());     System.out.println("First time routed to: " + servers.get(bucket));      // one of the back end servers is removed from the (middle of the) pool     servers.remove(1);      bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());     System.out.println("Second time routed to: " + servers.get(bucket)); } 

Leads to the output:

 First time routed to: server4 Second time routed to: server5 

What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.

Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets to 99 and bucket 99 gets distributed across the remaining 99 buckets.

like image 583
GamingBuck Avatar asked Sep 07 '12 13:09

GamingBuck


People also ask

How hashes are useful for data?

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.

What is hashCode and how it works in hashing?

Understanding How hashCode() Works Simply put, hashCode() returns an integer value, generated by a hashing algorithm. Objects that are equal (according to their equals()) must return the same hash code.

What does hashes are used to ensure data and message?

Hashing uses functions or algorithms to map object data to a representative integer value. A hash can then be used to narrow down searches when locating these items on that object data map. For example, in hash tables, developers store data -- perhaps a customer record -- in the form of key and value pairs.


1 Answers

I'm afraid that no data structure can do it really right with the current consistentHash. As the method accepts only the list size, nothing but appending and removal from the end can be supported. Currently, the best solution consist probably of replacing

servers.remove(n) 

by

server.set(n, servers.get(servers.size() - 1); servers.remove(servers.size() - 1); 

This way you sort of swap the failed and the very last server. This looks bad as it makes the assignments to the two swapped servers wrong. This problem is only half as bad as one of them have failed. But it makes sense, as after the following removal of the last list element, everything's fine, except for the assignments to the failed server and to the previously last server.

So twice as much assignments as needed change. Not optimal, but hopefully usable?

like image 124
maaartinus Avatar answered Sep 22 '22 04:09

maaartinus