Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a hash function in java?

I have check out this Wikipedia page on it, but I still don't understand it. Can someone please help my dim-witted mind to understand the concepts of hashing, hashtable/hashmap, and hash functions? Some examples would really help.

like image 968
Mohit Deshpande Avatar asked Jun 18 '10 12:06

Mohit Deshpande


People also ask

What is hash function and example?

A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a significant number or string to a small integer that can be used as the index in the hash table.

What is hash function?

Definition: A hash function is a function that takes a set of inputs of any arbitrary size and fits them into a table or other data structure that contains fixed-size elements.


2 Answers

The Wikipedia article will have a lot of technical information, but a simplistic view of hashing is something like the following.

Imagine that there's a magical function that can give a number to any object. Given the same object, it always return the same number.

Immediately now you have a quick way to test if two objects are the same: ask this function for their numbers and compare. If they're different, then they're not the same.

But what if they have the same number? Could two different objects have the same number?

Yes, this is possible in most scenario. Let's say that the function can only give numbers between 1..10, for example, and there are 100 different objects. Then of course some different objects must have the same number. This is what is called a "collision". A "collision" makes our quick equality test not as useful, so as much as possible we want to minimize its happening. A good magical function is one that would try to minimize the number of "collision".

So what else can you do with this number? Well, you can use it to index an array. Given an object, you can put it at the index given by the number from this magical function. This array is essentially what a hashtable is; this magical function is a hash function.

like image 172
polygenelubricants Avatar answered Oct 09 '22 11:10

polygenelubricants


A hash function is a way to create a compact representation of an arbitrarily large amount of data. In java with the hashcode method this means somehow describing the state of your object (no matter how large) in an int (4 bytes). And is usually written to be a fairly fast as explained below.

To simplify in hashtables/hashmaps the hashcode serves as sort of a cheap equals. Take two objects a and b of type Foo lets says to figure out if a.equals(b) takes 500 ms where as calculating a (efficient) hashcode only take 10ms. So if we want to know if a.equals(b) instead of doing that directly first we will look at the hashcodes and ask does a.hashCode() == b.hashCode(). Note that this will take only 20ms in our example.

Because of the API definition of hashcode we know that if the hashcode of a is not equal to b then a.equals(b) should never be true. So in our above test if we see the hashcodes are unequal then we never need to do the longer .equals() test, this is why you should always override hashCode and equals together.

You may also see references about writing "good" or "well distributed" hashcodes. This has to do with the fact that the inverse of the previous statements about hashcode and equals is not true. More specifically a.hashCode() == b.hashCode() does not necessarily imply a.equals(b) So the idea of a good hashcode is you reduce the likelyhood of a.hashCode() == b.hashCode() when a.equals(b) is false. You may have seen this referred to as a collision of a hash function.

Back to hashmaps/tables. These are based on key/value pairs. So when you add or retrieve a value you will supply a key. So the first thing the map has to do is look for the key, which means finding something that .equals() the key you provide. But as we discussed above .equals() can be incredibly slow which means comparisons can be greatly sped up by checking hashcodes first. Since when the hashcodes are well distributed you should know quickly when x is definitely != y.

Now in addition to the comparison hashmaps/tables actually use the hashcodes to organize their internal storage of the data, however I think that is beyond the scope of what you are looking to understand at this point.

like image 22
M. Jessup Avatar answered Oct 09 '22 10:10

M. Jessup