Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash function for two integer arrays with minimal collisions

I'm working on a problem where I want to store objects consisting of two integer arrays of equal length (e.g. int a[] ={1,2,3,4} and int b[] ={1,2,2,6}) in a data structure (e.g. hashmap) . However, for different objects, the length of the two arrays can vary. Both arrays are consisting of integers from a given interval (e.g. between 0-200).

For storing the objects with the two arrays, I want to assign a single hashkey that is fast to compute, preserves both sequences, and will lead to minimal collisions.

I have first tried using Arrays.deepHashCode(int[][]) but quickly found collisions. Second, I tried to distribute the values in the arrays more equally, by changing a[i] and b[i] to new values so that a_new[i] = Math.pow(31,a[i]) % Math.pow(2,16) (actually using BigInteger to avoid overflow: BigInteger.valueOf(31).modPow(BigInteger.valueOf(a[i]), BigInteger.valueOf(Math.pow(2,16))); using BigInteger. Since the interval of values is limited I can pre-calculate it for each possible value. As a result I came up with following solution:

    int result = 31;
    for (int i = 0; i < a.length; i++) {
        result = result * 31 * a_new[i];
        result = result * 31 * b_new[i];
    }

This solution seems to work when there are only smaller arrays, but once a[] and b[] can contain up to 10 values it also leads to collisions. Now I wonder, if there is a better way to achieve what i want with less collisions.

Edit: I fixed it to use a proper java code for the powers

like image 290
Christian Avatar asked Apr 06 '26 04:04

Christian


2 Answers

Perhaps instead of multiplying each a[i] and b[i] by 31, you could store an array of primes, and the current number in the array by prime[i]?

Something like this:

int result = 31;
int[] primes = {3, 5, 7, 11, 13, 17, 19, 23, ... };
    for (int i = 0; i < a.length; i++) {
        result = result * primes[i % primes.length] * a_new[i]
        result = result * primes[i % primes.length] * b_new[i]
    }

You can also try with larger primes to make collisions less likely.

like image 73
McKay M Avatar answered Apr 08 '26 17:04

McKay M


Just stating the obvious...

return Objects.hash(a_new, b_new);
like image 40
Bohemian Avatar answered Apr 08 '26 17:04

Bohemian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!