Code speaks more than words, so:
final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));
Why is the HashMap internally calling resize()
21
2
times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)
Two resize()
calls is still not acceptable for my application. I need to optimize this.
If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.
If I want to optimize my usage of HashMap so that it does not need to resize itself at all, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.
Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().
The question:
If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize()
operations? Something like size * 10
? I would also like some background on why HashMap
is designed this way.
Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.
In Oracle JDK 8, HashMap resizes when the size is > threshold (capacity * load factor). With capacity of 16 and default load factor of 0.75 , resizing (to capacity of 32 ) takes place when the 13 th entry is put.
As you again are likely aware, the HashMaps are resized dynamically during runtime, based on the number of entries in the map. By default, the HashMaps uses a load factor of 75%.
As soon as 13th element (key-value pair) will come into the Hashmap, it will increase its size from default 24 = 16 buckets to 25 = 32 buckets. Another way to calculate size: When the load factor ratio (m/n) reaches 0.75 at that time, hashmap increases its capacity.
HashMap does not shrink when data is removed. Even if all keys are removed from HashMap , the inner size of it's table does not change.
The default Load Factor is 0.75
, i.e. 3/4
, which means that the internal hash table will be resized when 75 of the 100 values have been added.
FYI: resize()
is only called twice. Once when the first value is added, and once when it gets to 75% full.
To prevent resizing, you need to ensure that the 100th value will not cause resizing, i.e. size <= capacity * 0.75
aka size <= capacity * 3/4
aka size * 4/3 <= capacity
, so to be sure:
capacity = size * 4/3 + 1
With size = 100
, that means capacity = 134
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With