I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map<List<String>, Integer>
, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.
The flow of this project is:
Load Map up with entries
Iterate Map and do stuff
What I recently found out about HashMap though is how it re-sizes when the set size is breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could very well have about 20,000 empty entries by the time I clear the largest set of entries.
So I'm trying to micro-optimize this thing and I'm stuck with a dilemma of how to do this. My two thoughts are to:
Set the default value of the initial HashMap to a value that would allow it to at most ever re-size only once
Reinitialize the HashMap with the average size that is expected for each new entry set to limit re-sizing and allow the garbage collector to do some clean up
My intuition tells me option two might be the most reasonable one, but that could still prove for lots of re-sizing depending the next entry set. But then option one greatly limits re-sizing to a one time operation but then leaves me with literally thousands of null entries.
Are one of my two proposed solutions better than the other, is there not much difference in memory improvement between the two, or could there be some other solution I have overseen (that does not involve changing the data structure)?
EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.
EDIT2: Just to clarify, the size of the Map itself is the larger value. The key size (i.e. the list) is ONLY ever at 3.
The question and accepted response here were so wrong, that I had to reply.
I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map, Integer>, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.
Please don't take me wrong, but this is tiny!!! Don't even bother to optimize something like this! I quickly made a test, filling "11300" elements in a hashmap is less than a dozen of milliseconds.
What I recently found out about HashMap though is how it re-sizes when the set size is > breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could > very well have about 20,000 empty entries by the time I clear the largest set of
entries.
...just to be clear. Empty entries consume almost no space, these are just empty pointers. 8 bytes per slot on 64bit machines, or 4 bytes per slot on 32bit. We're talking about a few kilobytes at most here.
Reinitialize the HashMap with the average size that is expected for each new entry set > to limit re-sizing and allow the garbage collector to do some clean up.
It's not the average "size" of the entries, it's the average amount of entries to be expected.
EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.
It's unlikely to be the map. Use a profiler! You can store millions of elements without a fuss.
The accepted answer is bad
You could change these values on initialisation, so the size of 11300 and a factorLoad of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.
This is not good advice. Using the same capacity as the expected number of items inserted and a load factor of "one", you are bound to have really large amounts of hash collisions. This will be a performance disaster.
Conclusion
If you don't know how stuff works, don't try to micro-optimize.
I did some research, by ending up on this page: How does a HashMap work in Java
The second last heading has to do with resizing overhead, stating the defaults for a HashMap is a size
of 16
, and a factorLoad
of 0.75
.
You could change these values on initialisation, so the size
of 11300
and a factorLoad
of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.
I did a quick experiment, using this code:
public static void main(String[] args) throws Exception {
Map<String, Integer> map = new HashMap<>(11000000, 1);
// Map<String, Integer> map = new HashMap<>();
for (int i = 0; i < 11000000; i++) {
map.put(i + "", i);
}
System.out.println(map.size());
Thread.sleep(9000);
}
Swapping the two Map
initialisations, and then checking the memory it consumes in Task Manager
.
With the initial size and and factorLoad set, it uses ~1.45GB
of memory.
Without the values set, it uses ~1.87GB
of memory.
Re-initialising the Map
every time instead of clearing it for a potentially smaller Map
to take its place will be slower, but you would possibly end up with more memory temporarily.
You could also do both. Re-initialise to set the initial size and the factorLoad
properites, should you know the amount of List
objects for each cycle.
The article also suggests that the Java 8 HashMap
, though potentially faster, could also potentially have more memory overhead than in Java 7. It might be worth trying to compile the program in both versions and see which provides an improved memory solution. Would be interesting if nothing else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With