Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map

In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map.

For example initially default capacity = 16 , load factor = 0.75 and hence threshold = (capacity * load factor) = (16 * 0.75) = 12.

Map resize when we add 13th element why is it so, why author of map decided to keep it capacity * load factor (which is 12) ? Why not same as capacity (which is 16).

why not keep threshold value equal to capacity so that rehashing only takes place when hashmap gets full ?

like image 786
niiraj874u Avatar asked Dec 09 '14 16:12

niiraj874u


People also ask

What is threshold value in HashMap?

The threshold of a HashMap is approximately the product of current capacity and load factor. Rehashing is the process of re-calculating the hash code of already stored entries.

What happens when threshold is reached in HashMap?

Whenever HashMap reaches its threshold, rehashing takes place. Rehashing is a process where new HashMap object with new capacity is created and all old elements (key-value pairs) are placed into new object after recalculating their hashcode. This process of rehashing is both space and time consuming.

How does resizing happens in HashMap?

In Oracle JDK 8, HashMap resizes when the size is > threshold (capacity * load factor). With capacity of 16 and default load factor of 0.75 , resizing (to capacity of 32 ) takes place when the 13 th entry is put.

What happens on HashMap in Java if the size of the HashMap exceeds the given threshold defined by load factor?

The Load Factor is a threshold, if the ratio of the current element by initial capacity crosses this threshold then the capacity increases so that the operational complexity of the HashMap remains O(1). The meaning of operational complexity of O(1) means the retrieval and insertion operations take constant time.


2 Answers

Javadoc, Javadoc, Javadoc. That is the first place you look. On the HashMap it says:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

As on the theory of hash maps - if your map is full, then you're doing something very, very wrong. By that time you're likely at O(sqrt(N)) on lookups with random data - BAD. You never want your hashmap to be full. But a very sparse map will waste too much space (as you've noted), and will take too long to iterate through. Hence there should be a load factor, that is less than 1 for most use cases.

Note: The "wasted space" is proportional to the size of the map, and inversely proportional to the load factor. However lookup times have a more complex expected performance function. This means that the same load factor will not work for different size hash maps - as it will mean different scale tradeoffs.


A general overview of the tradeoffs can be found in Knuth "The Art of Computer Programming" vol 3.

like image 187
Ordous Avatar answered Sep 30 '22 08:09

Ordous


From a theory perspective, the likelihood of maintaining no collisions with a full hash table is very low, so hash tables will be resized to maintain their desired O(1) lookup property - less collisions means more direct access to entries and less searching.

like image 36
C.B. Avatar answered Sep 30 '22 07:09

C.B.