I have a function that converts a list to a map. The map's size won't change after this function is called. I'm trying to decide between the following two implementations:
Map<Long, Object> listToMap(List<Object> objs) {
/* Implementation One: */
Map<Long, Object> map = new HashMap<>(objs.size(), 1);
for (Object obj : objs) {
map.put(obj.getKey(), obj);
}
return map;
/* Implementation Two: */
return objs.stream().collect(Collectors.toMap(Object::getKey, obj -> obj));
}
In the first implementation, I've allocated just enough memory for all the elements by using a load factor of 1 and the list's size. This ensures that a resize operation won't be performed. Then, I iterate through the list and add elements one-by-one.
In the second implementation, I use Java 8 streams for better readability.
My question is: will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?
The second implementation will involve multiple resizes of the HashMap.
I determined this by simply running it in a debugger and breaking every time the hash map gets resized. First I tweaked the code you posted to make it compile on my system:
import java.util.*;
import java.util.stream.*;
class Test {
public static void main(String[] args) {
List<Object> list = new ArrayList<Object>();
for(int i=0; i<100000; i++) {
list.add(new Integer(i));
}
new Test().listToMap(list);
}
Map<Integer, Object> listToMap(List<Object> objs) {
return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
}
}
Then I compiled it and ran it in the debugger until it hit listToMap
:
$ javac Test.java && jdb Test
Initializing jdb ...
> stop in Test.listToMap
Deferring breakpoint Test.listToMap.
It will be set after the class is loaded.
> run
run Test
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: Set deferred breakpoint Test.listToMap
Breakpoint hit: "thread=main", Test.listToMap(), line=14 bci=0
14 return objs.stream().collect(Collectors.toMap(Object::hashCode, obj -> obj));
main[1]
Then I set a breakpoint in java.util.HashMap.resize
and continued:
main[1] stop in java.util.HashMap.resize
Set breakpoint java.util.HashMap.resize
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1]
and cont
inued some more until I got bored:
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(),
line=678 bci=0
main[1] print size
size = 3073
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] print size
size = 6145
main[1] cont
>
Breakpoint hit: "thread=main", java.util.HashMap.resize(), line=678 bci=0
main[1] print size
size = 12289
So yes: it most definitely keeps resizing over and over.
Will the second implementation involve multiple resizes of the HashMap or has it been optimized to allocate just enough memory?
In your code, the former. See https://stackoverflow.com/a/51333961/139985
It is worth noting that for your current implementation:
collect
finishes, you could still end up with a main hash array that is up to 2 times too big. The memory "wasted" could be up to 8 bytes per entry in the table, but on average it will be 4 bytes per entry.HashMap
. Each entry consumes roughly 32 bytes ... in addition to the space used to represent the key and value.(The above numbers assume 64 bit references.)
As an alternative, if you use the 4 argument overload of toMap()
you can provide a Supplier
to create the Map
to be populated. That allows you to do the following:
HashMap
with an initial capacity large enough to avoid resizing, but not too large.Map
which uses less memory per entry than HashMap
.Map<K,V>
... for your K
and V
types. (For example, you could potentially use TLongObjectHashMap
from the GNU Trove library.)(In the latter two cases, the goal is to find a Map
or "map-like" class that uses less memory (for your K
and V
types) but still has appropriate performance for lookups.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With