Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Efficient way to Handle a Large HashMap

I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map<List<String>, Integer>, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.

The flow of this project is:

  • Load Map up with entries

  • Iterate Map and do stuff

  • Clear map for next set of entries

What I recently found out about HashMap though is how it re-sizes when the set size is breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could very well have about 20,000 empty entries by the time I clear the largest set of entries.

So I'm trying to micro-optimize this thing and I'm stuck with a dilemma of how to do this. My two thoughts are to:

  1. Set the default value of the initial HashMap to a value that would allow it to at most ever re-size only once

  2. Reinitialize the HashMap with the average size that is expected for each new entry set to limit re-sizing and allow the garbage collector to do some clean up

My intuition tells me option two might be the most reasonable one, but that could still prove for lots of re-sizing depending the next entry set. But then option one greatly limits re-sizing to a one time operation but then leaves me with literally thousands of null entries.

Are one of my two proposed solutions better than the other, is there not much difference in memory improvement between the two, or could there be some other solution I have overseen (that does not involve changing the data structure)?

EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.

EDIT2: Just to clarify, the size of the Map itself is the larger value. The key size (i.e. the list) is ONLY ever at 3.

like image 474
LumberSzquatch Avatar asked Dec 10 '22 16:12

LumberSzquatch


2 Answers

The question and accepted response here were so wrong, that I had to reply.

I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map, Integer>, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.

Please don't take me wrong, but this is tiny!!! Don't even bother to optimize something like this! I quickly made a test, filling "11300" elements in a hashmap is less than a dozen of milliseconds.

What I recently found out about HashMap though is how it re-sizes when the set size is > breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could > very well have about 20,000 empty entries by the time I clear the largest set of
entries.

...just to be clear. Empty entries consume almost no space, these are just empty pointers. 8 bytes per slot on 64bit machines, or 4 bytes per slot on 32bit. We're talking about a few kilobytes at most here.

Reinitialize the HashMap with the average size that is expected for each new entry set > to limit re-sizing and allow the garbage collector to do some clean up.

It's not the average "size" of the entries, it's the average amount of entries to be expected.

EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.

It's unlikely to be the map. Use a profiler! You can store millions of elements without a fuss.


The accepted answer is bad

You could change these values on initialisation, so the size of 11300 and a factorLoad of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.

This is not good advice. Using the same capacity as the expected number of items inserted and a load factor of "one", you are bound to have really large amounts of hash collisions. This will be a performance disaster.


Conclusion

If you don't know how stuff works, don't try to micro-optimize.

like image 61
dagnelies Avatar answered Mar 22 '23 23:03

dagnelies


I did some research, by ending up on this page: How does a HashMap work in Java

The second last heading has to do with resizing overhead, stating the defaults for a HashMap is a size of 16, and a factorLoad of 0.75.

You could change these values on initialisation, so the size of 11300 and a factorLoad of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.

I did a quick experiment, using this code:

public static void main(String[] args) throws Exception {
    Map<String, Integer> map = new HashMap<>(11000000, 1);
    //        Map<String, Integer> map = new HashMap<>();
    for (int i = 0; i < 11000000; i++) {
        map.put(i + "", i);
    }
    System.out.println(map.size());
    Thread.sleep(9000);
}

Swapping the two Map initialisations, and then checking the memory it consumes in Task Manager.

With the initial size and and factorLoad set, it uses ~1.45GB of memory. Without the values set, it uses ~1.87GB of memory.

Re-initialising the Map every time instead of clearing it for a potentially smaller Map to take its place will be slower, but you would possibly end up with more memory temporarily.

You could also do both. Re-initialise to set the initial size and the factorLoad properites, should you know the amount of List objects for each cycle.

The article also suggests that the Java 8 HashMap, though potentially faster, could also potentially have more memory overhead than in Java 7. It might be worth trying to compile the program in both versions and see which provides an improved memory solution. Would be interesting if nothing else.

like image 22
Propagandian Avatar answered Mar 23 '23 00:03

Propagandian