In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map.
For example initially default capacity = 16 , load factor = 0.75 and hence threshold = (capacity * load factor) = (16 * 0.75) = 12
.
Map resize when we add 13th element why is it so, why author of map decided to keep it capacity * load factor
(which is 12) ? Why not same as capacity (which is 16).
why not keep threshold value equal to capacity so that rehashing only takes place when hashmap gets full ?
The threshold of a HashMap is approximately the product of current capacity and load factor. Rehashing is the process of re-calculating the hash code of already stored entries.
Whenever HashMap reaches its threshold, rehashing takes place. Rehashing is a process where new HashMap object with new capacity is created and all old elements (key-value pairs) are placed into new object after recalculating their hashcode. This process of rehashing is both space and time consuming.
In Oracle JDK 8, HashMap resizes when the size is > threshold (capacity * load factor). With capacity of 16 and default load factor of 0.75 , resizing (to capacity of 32 ) takes place when the 13 th entry is put.
The Load Factor is a threshold, if the ratio of the current element by initial capacity crosses this threshold then the capacity increases so that the operational complexity of the HashMap remains O(1). The meaning of operational complexity of O(1) means the retrieval and insertion operations take constant time.
Javadoc, Javadoc, Javadoc. That is the first place you look. On the HashMap
it says:
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
As on the theory of hash maps - if your map is full, then you're doing something very, very wrong. By that time you're likely at O(sqrt(N))
on lookups with random data - BAD. You never want your hashmap to be full. But a very sparse map will waste too much space (as you've noted), and will take too long to iterate through. Hence there should be a load factor, that is less than 1 for most use cases.
Note: The "wasted space" is proportional to the size of the map, and inversely proportional to the load factor. However lookup times have a more complex expected performance function. This means that the same load factor will not work for different size hash maps - as it will mean different scale tradeoffs.
A general overview of the tradeoffs can be found in Knuth "The Art of Computer Programming" vol 3.
From a theory perspective, the likelihood of maintaining no collisions with a full hash table is very low, so hash tables will be resized to maintain their desired O(1)
lookup property - less collisions means more direct access to entries and less searching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With