Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rehashing process in hashmap or hashtable

How is the rehashing process done in a hashmap or hashtable when the size exceeds the maxthreshold value?

Are all pairs just copied to a new array of buckets?

EDIT:

What happen to the elements in the same bucket (in linked list) after rehashing? I mean will they remain in same bucket after rehashing?

like image 503
a Learner Avatar asked May 18 '12 15:05

a Learner


People also ask

How rehashing happens in HashMap?

It can be also defined as rehashing is the process of re-calculating the hash code of already stored entries and moving them to a bigger size hash map when the number of elements in the map reaches the maximum threshold value. In simple words, rehashing is the reverse of hashing process. It retains the performance.

What is load factor and rehashing in HashMap?

Load Factor in Hashing The Load factor is a measure that decides when to increase the HashTable capacity to maintain the search and insert operation complexity of O(1). The default load factor of HashMap used in java, for instance, is 0.75f (75% of the map size).

How is rehashing performed?

Hashing is implemented in two steps: An element is converted into an integer by using a hash function. This element can be used as an index to store the original element, which falls into the hash table. The element is stored in the hash table where it can be quickly retrieved using hashed key.

Which is faster HashMap or Hashtable?

HashMap is not synchronized, therefore it's faster and uses less memory than Hashtable. Generally, unsynchronized objects are faster than synchronized ones in a single threaded application.


3 Answers

The maximum threshold in the question is called the load factor.

It is advisable to have a load factor of around 0.75. Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries which can be inserted before a increment in size of the underlying data structure is required.

Rehashing can be done in two cases:

  1. When the present m'/n ratio increases beyond the load factor

  2. M'/n ratio falls to a very low value say 0.1

In both the cases m' is the current number of entries. Also, both the cases demand the shifting of the present entries into a bigger or a smaller hash table.

In the question's context rehashing is the process of applying a hash function to the entries to move them to another hash table. It is possible to use the hash function which was used earlier or use a new function altogether.

Please note: Rehashing is also done when a collision occurs. (It's a way of handling collisions too.)

To add some more context and a detailed discussion please visit my blog Hashing Basics

like image 111
dharam Avatar answered Sep 28 '22 11:09

dharam


Rehashing of a hash map is done when the number of elements in the map reaches the maximum threshold value.

Usually the load factor value is 0.75 and the default initial capacity value is 16. Once the number of elements reaches or crosses 0.75 times the capacity, then rehashing of map takes place. In this case when the number of elements are 12, then rehashing occurs. (0.75 * 16 = 12)

When rehashing occurs a new hash function or even the same hash function could be used but the buckets at which the values are present could change. Basically when rehashing occurs the number of buckets are approximately doubled and hence the new index at which the value has to be put changes.

While rehashing, the linked list for each bucket gets reversed in order. This happens because HashMap doesn't append the new element at the tail instead it appends the new element at the head. So when rehashing occurs, it reads each element and inserts it in the new bucket at the head and then keeps on adding next elements from the old map at the head of the new map resulting in reversal of linked list.

If there are multiple threads handling the same hash map it could result in infinite loop.

Detailed explanation stating how infinite loop occurs in the above case can be found here : http://mailinator.blogspot.hu/2009/06/beautiful-race-condition.html

If the elements inserted in the map has to be sorted wrt the keys then TreeMap can be used. But HashMap would be more efficient if order of keys doesn't matter.

like image 31
Melwin Avatar answered Sep 28 '22 09:09

Melwin


Hashing – Rehashing and Race condition

Basically while creating a hash map, collection assigns it a default capacity (of 2^4 i.e 16.). Later stage when elements are added in the map and after a certain stage when you come close to your initial defined capacity there is a requirement of ReHashing to retain the performance.

There is LoadFactor defined for the collection (said to be good as .75) and this specifies the good index for time and space.

  • LARGER load factor => lower space consumption but higher lookups
  • SMALLER Load factor => Larger space consumption compared to the required no of elements.

Java specification suggests that Good load factor value is .75

Hence Suppose you have a maximum requirement to store 10 elements in hash then considering the Good Loadfactor .75 = Rehashing would occur after adding 7 elements in the collection. In case if your requirement, in this case, would not accede to 7 then Rehashing would never occur.

If there are really large no of elements going to be stored in the hashmap then it is always good to create HashMap with sufficient capacity; this is more efficient than letting it to perform automatic rehashing.

RACE condition : While doing the rehashing internal elements which are stored in a linked list for given bucket. They get reverse in the order. Suppose there are two threads encounter the race condition in same time then there are chances of second therad can go in infinite loop while traversal since the order has been changed.

like image 26
Manjul Avatar answered Sep 28 '22 09:09

Manjul