Does anyone know a hashing function (for Strings, if it matters) for a variable range of buckets, which is always odd (or prime, if necessary)?
Essentially, I'm looking for a hashing function which will provide a uniform distribution over n buckets, where n is odd (or prime, since n will be small).
Java's .hashCode() provides uniform distribution, but only for powers of 2.
Here is some quick test code I whipped up which asserts this.
I cross-posted this on the CS Theory StackExchange since it seems to fall somewhere between theory & engineering.
Running your program with 37 as buckets length, and the hashing part replaced with
for (String key : keys) {
int hash = key.hashCode();
int index = Math.abs(hash % buckets.length);
buckets[index] = buckets[index] + 1;
}
leads to the following result:
Bucket 0: 4152
Bucket 1: 2593
Bucket 2: 2703
Bucket 3: 2620
Bucket 4: 2742
Bucket 5: 2647
Bucket 6: 2707
Bucket 7: 2673
Bucket 8: 2664
Bucket 9: 2685
Bucket 10: 2734
Bucket 11: 2708
Bucket 12: 2661
Bucket 13: 2678
Bucket 14: 2681
Bucket 15: 2662
Bucket 16: 2682
Bucket 17: 2667
Bucket 18: 2619
Bucket 19: 2572
Bucket 20: 2608
Bucket 21: 2669
Bucket 22: 2670
Bucket 23: 2629
Bucket 24: 2748
Bucket 25: 2651
Bucket 26: 2618
Bucket 27: 2628
Bucket 28: 2740
Bucket 29: 2608
Bucket 30: 2650
Bucket 31: 2645
Bucket 32: 2687
Bucket 33: 2699
Bucket 34: 2627
Bucket 35: 2715
Bucket 36: 2558
Mean: 2702.7027027027025
Standard Deviation: 245.8085241264752
which looks quite good.
You're not testing the distribution of String.hashCode(). You're testing the distribution if HashMap's hash() method, which uses the key's hashCode, and has been designed to try getting a uniform distribution for its capacity, which MUST be a power of 2. If hashCode() already returns well-distributed values, simply taking the modulo using a prime number as divisor will lead to a good distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With