Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Exactly is Hash Collision

Tags:

Hash Collision or Hashing Collision in HashMap is not a new topic and I've come across several blogs and discussion boards explaining how to produce Hash Collision or how to avoid it in an ambiguous and detailed way. I recently came across this question in an interview. I had lot of things to explain but I think it was really hard to precisely give the right explanation. Sorry if my questions are repeated here, please route me to the precise answer:

  1. What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid?
  2. What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?
  3. Does anything go wrong or unexpected when Hash Collision happens? I mean is there any reason why we should avoid Hash Collision?
  4. Does Java generate or at least try to generate unique hashCode per class during object initiation? If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?

I'll be greateful if you could please share you answers for one or all of these questions.

like image 401
sribasu Avatar asked Aug 21 '17 11:08

sribasu


People also ask

What is hash collision example?

Hash Collisions For example, assume a hash function h(text) sums of all character codes in a text. It will produce the same hash value (collision) for texts holding the same letters in different order, i.e. h('abc') == h('cab') == h('bca') .

What happens when hash collision occurs?

A collision, or more specifically, a hash code collision in a HashMap, is a situation where two or more key objects produce the same final hash value and hence point to the same bucket location or array index.

What is hash collision and how do you resolve it?

Hash collision is resolved by open addressing with linear probing. Since CodeMonk and Hashing are hashed to the same index i.e. 2, store Hashing at 3 as the interval between successive probes is 1. Implementation of hash table with linear probing. Assumption. There are no more than 20 elements in the data set.

What is a hash collision in cyber security?

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified. There are roughly two types of collision attacks: Classical collision attack.


1 Answers

What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid?

It's a feature. It arises out of the nature of a hashCode: a mapping from a large value space to a much smaller value space. There are going to be collisions, by design and intent.

What exactly causes Hash Collision - the bad definition of custom class' hashCode() method,

A bad design can make it worse, but it is endemic in the notion.

OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone,

No.

OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?

This doesn't really make sense. Hashes are bound to collide sooner or later, and poor algorithms can make it sooner. That's about it.

Does anything go wrong or unexpected when Hash Collision happens?

Not if the hash table is competently written. A hash collision only means that the hashCode is not unique, which puts you into calling equals(), and the more duplicates there are the worse the performance.

I mean is there any reason why we should avoid Hash Collision?

You have to trade off ease of computation against spread of values. There is no single black and white answer.

Does Java generate or atleast try to generate unique hasCode per class during object initiation?

No. 'Unique hash code' is a contradiction in terms.

If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?

The question is meaningless. If you're using String you don't have any choice about the hashing algorithm, and you are also using a class whose hashCode has been slaved over by experts for twenty or more years.

like image 136
user207421 Avatar answered Sep 21 '22 19:09

user207421