Memory Optimization of Java Collectors.toMap

Question

I have a function that converts a list to a map. The map's size won't change after this function is called. I'm trying to decide between the following two implementations:

Map<Long, Object> listToMap(List<Object> objs) {
        /* Implementation One: */

        Map<Long, Object> map = new HashMap<>(objs.size(), 1);
        for (Object obj : objs) {
            map.put(obj.getKey(), obj);
        }
        return map;

        /* Implementation Two: */

        return objs.stream().collect(Collectors.toMap(Object::getKey, obj -> obj));

    }

In the first implementation, I've allocated just enough memory for all the elements by using a load factor of 1 and the list's size. This ensures that a resize operation won't be performed. Then, I iterate through the list and add elements one-by-one.

In the second implementation, I use Java 8 streams for better readability.

My question is: will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

that other guy · Accepted Answer

The second implementation will involve multiple resizes of the HashMap.

I determined this by simply running it in a debugger and breaking every time the hash map gets resized. First I tweaked the code you posted to make it compile on my system:

import java.util.*;
import java.util.stream.*;

class Test {
  public static void main(String[] args) {
    List<Object> list = new ArrayList<Object>();
    for(int i=0; i<100000; i++) {
      list.add(new Integer(i));
    }
    new Test().listToMap(list);
  }

    Map<Integer, Object> listToMap(List<Object> objs) {
        return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
    }
}

Then I compiled it and ran it in the debugger until it hit listToMap:

$ javac Test.java && jdb Test
Initializing jdb ...
> stop in Test.listToMap
Deferring breakpoint Test.listToMap.
It will be set after the class is loaded.
> run
run Test
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Test.listToMap

Breakpoint hit: "thread=main", Test.listToMap(), line=14 bci=0
14            return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));

main[1]

Then I set a breakpoint in java.util.HashMap.resize and continued:

main[1] stop in java.util.HashMap.resize
Set breakpoint java.util.HashMap.resize
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1]

and continued some more until I got bored:

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), 
line=678 bci=0

main[1] print size
 size = 3073
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 6145
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 12289

So yes: it most definitely keeps resizing over and over.

Stephen C · Answer

Will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

In your code, the former. See https://stackoverflow.com/a/51333961/139985

It is worth noting that for your current implementation:

Much of the extra memory consumed by resizing will be reclaimed on the next GC run.
After the collect finishes, you could still end up with a main hash array that is up to 2 times too big. The memory "wasted" could be up to 8 bytes per entry in the table, but on average it will be 4 bytes per entry.
Even so, the hash entry nodes will be the largest consumer of memory in a HashMap. Each entry consumes roughly 32 bytes ... in addition to the space used to represent the key and value.

^{(The above numbers assume 64 bit references.)}

As an alternative, if you use the 4 argument overload of toMap() you can provide a Supplier to create the Map to be populated. That allows you to do the following:

Allocate a HashMap with an initial capacity large enough to avoid resizing, but not too large.
Use a (hypothetical) alternative implementation of Map which uses less memory per entry than HashMap.
Create a wrapper to populate a map-like object that doesn't implement Map<K,V> ... for your K and V types. (For example, you could potentially use TLongObjectHashMap from the GNU Trove library.)

(In the latter two cases, the goal is to find a Map or "map-like" class that uses less memory (for your K and V types) but still has appropriate performance for lookups.)

Memory Optimization of Java Collectors.toMap

Tags:

java

java-8

java-stream

Justin Borromeo

2 Answers

that other guy

Stephen C

Recent Activity

Donate For Us

Memory Optimization of Java Collectors.toMap

Tags:

java

java-8

java-stream

Justin Borromeo

2 Answers

that other guy

Stephen C

Related questions

Recent Activity

Donate For Us