I was doing some tests with a colleague, and we were pulling data in from a database (about 350,000 records), converting each record into an object and a key object, and then populating them into an ImmutableMap.Builder.
When we called the build() method it took forever, probably due to all the data integrity checks that come with ImmutableMap (dupe keys, nulls, etc). To be fair we tried to use a hashmap as well, and that took awhile but not as long as the ImmutableMap. We finally settled on just using ConcurrentHashMap which we populated with 9 threads as the records were iterated, and wrapped that in an unmodifiable map. The performance was good.
I noticed on the documentation it read ImutableMap is not optimized for "equals()" operations. As a die-hard immutable-ist, I'd like the ImmutableMap to work for large data volumes but I'm getting the sense it is not meant for that. Is that assumption right? Is it optimized only for small/medium-sized data sets? Is there a hidden implementation I need to invoke via "copyOf()" or something?
My experience is that none of Java's built in Collection
classes are really optimised for performance at huge volumes. For example HashMap uses simple iteration once hashCode has been used as an index in the array and compares the key via equals
to each item with the same hash. If you are going to store several million items in the map then you need a very well designed hash and large capacity. These classes are designed to be as generic and safe as possible.
So performance optimisations to try if you wish to stick with the standard Java HashMap:
HashMap
will be expanded many times. Ideally set initial capacity as close as possible to the final value.equals
implementation is as efficient as possible.There are massive performance optimisations you can apply if you know (for example) that your key is an integer, for example using some form of btree after the hash has been applied and using ==
rather than equals
.
So the simple answer is that I believe you will need to write your own collection to get the performance you want or use one of the more optimised implementations available.
I guess your key.equals()
is a time consuming method.
key.equals()
will be called much more times in ImmutableMap.build()
than HashMap.put()
(in a loop). And key.hashCode()
is called same times, both HashMap.put()
and ImmutableMap.build()
. As result, if key.equals()
takes long time, the whole duration can be different much.
key.equals()
are called a few times during HashMap.put()
(Good hash algorithm leads a few collision).
While in case of ImmutableMap.build()
, key.equals()
will be called many times when checkNoConflictInBucket()
. O(n) for key.equals()
.
Once the map is built, two types of Map should not be different much when access, as both are hash-based.
Sample:
There are 10000 random String to put as keys. HashMap.put()
callsString.equals()
2 times, while ImmutableMap.build()
calls 3000 times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With