Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Optimization of Java Collectors.toMap

I have a function that converts a list to a map. The map's size won't change after this function is called. I'm trying to decide between the following two implementations:

Map<Long, Object> listToMap(List<Object> objs) {
        /* Implementation One: */

        Map<Long, Object> map = new HashMap<>(objs.size(), 1);
        for (Object obj : objs) {
            map.put(obj.getKey(), obj);
        }
        return map;

        /* Implementation Two: */

        return objs.stream().collect(Collectors.toMap(Object::getKey, obj -> obj));

    }

In the first implementation, I've allocated just enough memory for all the elements by using a load factor of 1 and the list's size. This ensures that a resize operation won't be performed. Then, I iterate through the list and add elements one-by-one.

In the second implementation, I use Java 8 streams for better readability.

My question is: will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

like image 430
Justin Borromeo Avatar asked Jul 13 '18 22:07

Justin Borromeo


2 Answers

The second implementation will involve multiple resizes of the HashMap.

I determined this by simply running it in a debugger and breaking every time the hash map gets resized. First I tweaked the code you posted to make it compile on my system:

import java.util.*;
import java.util.stream.*;

class Test {
  public static void main(String[] args) {
    List<Object> list = new ArrayList<Object>();
    for(int i=0; i<100000; i++) {
      list.add(new Integer(i));
    }
    new Test().listToMap(list);
  }

    Map<Integer, Object> listToMap(List<Object> objs) {
        return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
    }
}

Then I compiled it and ran it in the debugger until it hit listToMap:

$ javac Test.java && jdb Test
Initializing jdb ...
> stop in Test.listToMap
Deferring breakpoint Test.listToMap.
It will be set after the class is loaded.
> run
run Test
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Test.listToMap

Breakpoint hit: "thread=main", Test.listToMap(), line=14 bci=0
14            return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));

main[1]

Then I set a breakpoint in java.util.HashMap.resize and continued:

main[1] stop in java.util.HashMap.resize
Set breakpoint java.util.HashMap.resize
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1]

and continued some more until I got bored:

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), 
line=678 bci=0

main[1] print size
 size = 3073
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 6145
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0

main[1] print size
 size = 12289

So yes: it most definitely keeps resizing over and over.

like image 139
that other guy Avatar answered Nov 20 '22 10:11

that other guy


Will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?

In your code, the former. See https://stackoverflow.com/a/51333961/139985

It is worth noting that for your current implementation:

  1. Much of the extra memory consumed by resizing will be reclaimed on the next GC run.
  2. After the collect finishes, you could still end up with a main hash array that is up to 2 times too big. The memory "wasted" could be up to 8 bytes per entry in the table, but on average it will be 4 bytes per entry.
  3. Even so, the hash entry nodes will be the largest consumer of memory in a HashMap. Each entry consumes roughly 32 bytes ... in addition to the space used to represent the key and value.

(The above numbers assume 64 bit references.)


As an alternative, if you use the 4 argument overload of toMap() you can provide a Supplier to create the Map to be populated. That allows you to do the following:

  • Allocate a HashMap with an initial capacity large enough to avoid resizing, but not too large.
  • Use a (hypothetical) alternative implementation of Map which uses less memory per entry than HashMap.
  • Create a wrapper to populate a map-like object that doesn't implement Map<K,V> ... for your K and V types. (For example, you could potentially use TLongObjectHashMap from the GNU Trove library.)

(In the latter two cases, the goal is to find a Map or "map-like" class that uses less memory (for your K and V types) but still has appropriate performance for lookups.)

like image 7
Stephen C Avatar answered Nov 20 '22 10:11

Stephen C