Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 hashmap high memory usage

I use a hashmap to store a QTable for an implementation of a reinforcement learning algorithm. My hashmap should store 15000000 entries. When I ran my algorithm I saw that the memory used by the process is over 1000000K. When I calculated the memory, I would expect it to use not more than 530000K. I tried to write an example and I got the same high memory usage:

public static void main(String[] args) {
    HashMap map = new HashMap<>(16_000_000, 1);
    for(int i = 0; i < 15_000_000; i++){
        map.put(i, i);
    }
}

My memory calulation:

Each entryset is 32 bytes
Capacity is 15000000
HashMap Instance uses: 32 * SIZE + 4 * CAPACITY memory = (15000000 * 32 + 15000000 * 4) / 1024 = 527343.75K

Where I'm wrong in my memory calculations?

like image 427
Lev Levin Avatar asked Dec 24 '16 14:12

Lev Levin


People also ask

How much memory does a HashMap use Java?

As Figure 7 shows, when a HashMap is created, the result is a HashMap object and an array of HashMap$Entry objects at its default capacity of 16 entries. This gives a HashMap a size of 128 bytes when it is completely empty.

Can HashMap cause memory leak?

HashSet and HashMap use these methods in many operations, and if they're not overridden correctly, they can become a source for potential memory leak problems. Now we'll insert duplicate Person objects into a Map that uses this key.

How memory is allocated in HashMap?

… A HashMap stores data into multiple singly linked lists of entries (also called buckets or bins). All the lists are registered in an array of Entry (Entry<K,V>[] array) and the default capacity of this inner array is 16.


1 Answers

Well, in the best case, we assume a word size of 32 bits/4 bytes (with CompressedOops and CompressedClassesPointers). Then, a map entry consists of two words JVM overhead (klass pointer and mark word), key, value, hashcode and next pointer, making 6 words total, in other words, 24 bytes. So having 15,000,000 entry instances will consume 360 MB.

Additionally, there’s the array holding the entries. The HashMap uses capacities that are a power of two, so for 15,000,000 entries, the array size is at least 16,777,216, consuming 64 MiB.

Then, you have 30,000,000 Integer instances. The problem is that map.put(i, i) performs two boxing operations and while the JVM is encouraged to reuse objects when boxing, it is not required to do so and reusing won’t happen in your simple program that is likely to complete before the optimizer ever interferes.

To be precise, the first 128 Integer instances are reused, because for values in the -128 … +127 range, sharing is mandatory, but the implementation does this by initializing the entire cache on the first use, so for the first 128 iterations, it doesn’t create two instances, but the cache consists of 256 instances, which is twice that number, so we end up again with 30,000,000 Integer instances total. An Integer instance consist of at least the two JVM specific words and the actual int value, which would make 12 bytes, but due to the default alignment, the actually consumed memory will be 16 bytes, dividable by eight.

So the 30,000,000 created Integer instances consume 480 MB.

This makes a total of 360 MB + 64 MiB + 480 MB, which is more than 900 MB, making a heap size of 1 GB entirely plausible.

But that’s what profiling tools are for. After running your program, I got

The used memory sorted by size

Note that this tool only reports the used size of the objects, i.e. the 12 bytes for an Integer object without considering the padding that you will notice when looking at the total memory allocated by the JVM.

like image 72
Holger Avatar answered Oct 05 '22 23:10

Holger