When two unequal objects have the same hash value, this causes a collision in the hash table, because both objects want to be in the same slot (sometimes called a bucket). The hash algorithm must resolve such collisions.
1) If two objects are equal (i.e. the equals() method returns true), they must have the same hashcode. 2) If the hashCode() method is called multiple times on the same object, it must return the same result every time. 3) Two different objects can have the same hash code.
If two objects have the same hash code, it doesn't mean that they are equal. Overriding equals() alone will make your business fail with hashing data structures like: HashSet, HashMap, HashTable ... etc. Overriding hashcode() alone doesn't force Java to ignore memory addresses when comparing two objects.
1) HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location. 2) From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions.
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals()
.
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode()
and equals()
methods of keys:
If two keys are the same (equals()
returns true
when you compare them), their hashCode()
method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals()
to tell them apart.
Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap
as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
HashMap
is an array of Entry
objects.
Consider HashMap
as just an array of objects.
Have a look at what this Object
is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry
object represents a key-value pair. The field next
refers to another Entry
object if a bucket has more than one Entry
.
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next
field and so on. The last entry refers to null
.
When you create a HashMap
with the default constructor
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
hash % (arrayLength-1)
where element should be placed (bucket number)HashMap
, then value gets overwritten.If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next
field refers to the old element.
hash % (arrayLength-1)
Entry
.
If a desired element is not found, return null
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
Here is a rough description of HashMap
's mechanism, for Java 8
version, (it might be slightly different from Java 6).
hash()
on key, and it decide which bucket of the hashtable to use for a given key.Map.Entry
HashMap.Node
Linked list version of node.
It could represent:
HashMap.TreeNode
Node[] table
Set<Map.Entry> entrySet
Set of entities.int size
float loadFactor
int threshold
threshold = capacity * loadFactor
int hash(key)
How to map hash to bucket?
Use following logic:
static int hashToBucket(int tableSize, int hash) { return (tableSize - 1) & hash; }
In hash table, capacity means the bucket count, it could be get from table.length
.
Also could be calculated via threshold
and loadFactor
, thus no need to be defined as a class field.
Could get the effective capacity via: capacity()
threshold
reached, will double hashtable's capacity(table.length
), then perform a re-hash on all elements to rebuild the table.O(1)
, because:
O(1)
.O(1)
.O(1)
, not O(log N)
.The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()
) method.
In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With