The Java methods, Arrays.hashCode() or Objects.hash() return same hash for some Integer arrays with different content such as
Integer[] a = {0,4,5,0} // hash 927520
Integer[] b = {0,3,36,0} // hash 927520
Same result is returned by the custom hashcode method such as:
public int hash(final Integer[] indexes) {
final int prime = 31;
int result = 1;
for (Integer i : indexes) {
result = prime * result + ((i == null) ? 0 : i.hashCode());
}
return result;
}
I agree that this is the expected behavior. But, I want to generate distinct hashcode for them as contents are different.
What is the fastest way to calculate hash for Integer array without collision
The problem is a bit different. First think of why you need hashCode
to begin with = for fast(er) look-ups. Having two objects that would generate the same hash is not a problem at all, since that does not yet mean they are the same, of course (you would still check against equals
).
You have already a few comments under your question, saying that this is impossible, I just want to add some interesting things that you have not thought about (may be you simply don't know them).
In general, hash collisions
are much more frequent in java data structures that you might imagine. According to the Birthday problem and taking into consideration that a hash
is actually 32 bits
, we get to the fact that it would take only 77,164 unique values before there is a 50%
chances to generate a collision (and that is in the best case). So collisions are more than fine. That being said, there is JEP to improve this (in my understanding by first making the hash - a long
and working off that; but have not deep dived into it much).
Now that you know that hash collisions are more that fine, think of why they are used. Basically for fast(er) look-up. When there are two entries that have the same hash
, it means they will end in the same "bucket" and in java, that bucket, is a perfectly balanced red-black tree (for HashMap
and thus, HashSet
) - that is still super fast when looking for entries. Thus, in general, any hash based structure has a search time that is constant (i.e.: amortized O(1)
), so worry not about hash collisions.
There is no way to satisfy your requirements.
You have to understand that hashing functions can not create a bidirectional mapping. And that is what you would need here!
Meaning: there is a (close to) infinite number of arrays with arbitrary int values. If each of the hashes should uniquely point to a specific array setup, you could identify each array by its hash. But the range of int (or long) is not indefinite. There are simply more possible array combinations than int values to count them!
You can't map an indefinite set onto a set that is not indefinite.
In other words: if such a hashing method would exist, you could turn it into a compression algorithm that would reduce any content to a single int value.
So: collisions are an inherent property of hashing algorithms. You can't avoid them. If at all, you can fine tune a specific hashing function to minimize the collisions for a specific set of input data. But as said: what you are asking for is impossible from a conceptual/mathematical point of view.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With