Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: simplest integer hash

Tags:

java

hash

I need a quick hash function for integers:

 int hash(int n) { return ...; }

Is there something that exists already in Java?

The minimal properties that I need are:

  • hash(n) & 1 does not appear periodic when used with a bunch of consecutive values of n.
  • hash(n) & 1 is approximately equally likely to be 0 or 1.
like image 579
Jason S Avatar asked Mar 08 '12 21:03

Jason S


2 Answers

HashMap, as well as Guava's hash-based utilities, use the following method on hashCode() results to improve bit distributions and defend against weaker hash functions:

  /*
   * This method was written by Doug Lea with assistance from members of JCP
   * JSR-166 Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/licenses/publicdomain
   * 
   * As of 2010/06/11, this method is identical to the (package private) hash
   * method in OpenJDK 7's java.util.HashMap class.
   */
  static int smear(int hashCode) {
    hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12);
    return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
  }
like image 57
Louis Wasserman Avatar answered Sep 18 '22 13:09

Louis Wasserman


So, I read this question, thought hmm this is a pretty math-y question, it's probably out of my league. Then, I ended up spending so much time thinking about it that I actually believe I've got the answer: No function can satisfy the criteria that f(n) & 1 is non-periodic for consecutive values of n.

Hopefully someone will tell me how ridiculous my reasoning is, but until then I believe it's correct.

Here goes: Any binary integer n can be represented as either 1...0 or 1...1, and only the least significant bit of that bitmap will affect the result of n & 1. Further, the next consecutive integer n + 1 will always contain the opposite least significant bit. So, clearly any series of consecutive integers will exhibit a period of 2 when passed to the function n & 1. So then, is there any function f(n) that will sufficiently distribute the series of consecutive integers such that periodicity is eliminated?

Any function f(n) = n + c fails, as c must end in either 0 or 1, so the LSB will either flip or stay the same depending on the constant chosen.

The above also eliminates subtraction for all trivial cases, but I have not taken the time to analyze the carry behavior yet, so there may be a crack here.

Any function f(n) = c*n fails, as the LSB will always be 0 if c ends in 0 and always be equal to the LSB of n if c ends in 1.

Any function f(n) = n^c fails, by similar reasoning. A power function would always have the same LSB as n.

Any function f(n) = c^n fails, for the same reason.

Division and modulus were a bit less intuitive to me, but basically, the LSB of either option ends up being determined by a subtraction (already ruled out). The modulus will also obviously have a period equal to the divisor.

Unfortunately, I don't have the rigor necessary to prove this, but I believe any combination of the above operations will ultimately fail as well. This leads me to believe that we can rule out any transcendental function, because these are implemented with polynomials (Taylor series? not a terminology guy).

Finally, I held out hope on the train ride home that counting the bits would work; however, this is actually a periodic function as well. The way I thought about it was, imagine taking the sum of the digits of any decimal number. That sum obviously would run from 0 through 9, then drop to 1, run from 1 to 10, then drop to 2... It has a period, the range just keeps shifting higher the higher we count. We can actually do the same thing for the sum of the binary digits, in which case we get something like: 0,1,1,2,2,....5,5,6,6,7,7,8,8....

Did I leave anything out?

TL;DR I don't think your question has an answer.

like image 22
Tim Avatar answered Sep 18 '22 13:09

Tim