Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to calculate hash for Integer array without collision

Tags:

The Java methods, Arrays.hashCode() or Objects.hash() return same hash for some Integer arrays with different content such as

Integer[] a = {0,4,5,0}     // hash 927520
Integer[] b = {0,3,36,0}    // hash 927520

Same result is returned by the custom hashcode method such as:

public int hash(final Integer[] indexes) {
    final int prime = 31;
    int result = 1;
    for (Integer i : indexes) {
        result = prime * result + ((i == null) ? 0 : i.hashCode());
    }
    return result;
}

I agree that this is the expected behavior. But, I want to generate distinct hashcode for them as contents are different.

What is the fastest way to calculate hash for Integer array without collision

like image 850
Maithilish Avatar asked Sep 24 '18 11:09

Maithilish


2 Answers

The problem is a bit different. First think of why you need hashCode to begin with = for fast(er) look-ups. Having two objects that would generate the same hash is not a problem at all, since that does not yet mean they are the same, of course (you would still check against equals).

You have already a few comments under your question, saying that this is impossible, I just want to add some interesting things that you have not thought about (may be you simply don't know them).

In general, hash collisions are much more frequent in java data structures that you might imagine. According to the Birthday problem and taking into consideration that a hash is actually 32 bits, we get to the fact that it would take only 77,164 unique values before there is a 50% chances to generate a collision (and that is in the best case). So collisions are more than fine. That being said, there is JEP to improve this (in my understanding by first making the hash - a long and working off that; but have not deep dived into it much).

Now that you know that hash collisions are more that fine, think of why they are used. Basically for fast(er) look-up. When there are two entries that have the same hash, it means they will end in the same "bucket" and in java, that bucket, is a perfectly balanced red-black tree (for HashMap and thus, HashSet) - that is still super fast when looking for entries. Thus, in general, any hash based structure has a search time that is constant (i.e.: amortized O(1)), so worry not about hash collisions.

like image 93
Eugene Avatar answered Oct 21 '22 10:10

Eugene


There is no way to satisfy your requirements.

You have to understand that hashing functions can not create a bidirectional mapping. And that is what you would need here!

Meaning: there is a (close to) infinite number of arrays with arbitrary int values. If each of the hashes should uniquely point to a specific array setup, you could identify each array by its hash. But the range of int (or long) is not indefinite. There are simply more possible array combinations than int values to count them!

You can't map an indefinite set onto a set that is not indefinite.

In other words: if such a hashing method would exist, you could turn it into a compression algorithm that would reduce any content to a single int value.

So: collisions are an inherent property of hashing algorithms. You can't avoid them. If at all, you can fine tune a specific hashing function to minimize the collisions for a specific set of input data. But as said: what you are asking for is impossible from a conceptual/mathematical point of view.

like image 33
GhostCat Avatar answered Oct 21 '22 08:10

GhostCat