Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ImmutableMap a sub-optimal choice for large volume of keys/objects/

I was doing some tests with a colleague, and we were pulling data in from a database (about 350,000 records), converting each record into an object and a key object, and then populating them into an ImmutableMap.Builder.

When we called the build() method it took forever, probably due to all the data integrity checks that come with ImmutableMap (dupe keys, nulls, etc). To be fair we tried to use a hashmap as well, and that took awhile but not as long as the ImmutableMap. We finally settled on just using ConcurrentHashMap which we populated with 9 threads as the records were iterated, and wrapped that in an unmodifiable map. The performance was good.

I noticed on the documentation it read ImutableMap is not optimized for "equals()" operations. As a die-hard immutable-ist, I'd like the ImmutableMap to work for large data volumes but I'm getting the sense it is not meant for that. Is that assumption right? Is it optimized only for small/medium-sized data sets? Is there a hidden implementation I need to invoke via "copyOf()" or something?

like image 328
tmn Avatar asked Feb 01 '15 22:02

tmn


Video Answer


2 Answers

My experience is that none of Java's built in Collection classes are really optimised for performance at huge volumes. For example HashMap uses simple iteration once hashCode has been used as an index in the array and compares the key via equals to each item with the same hash. If you are going to store several million items in the map then you need a very well designed hash and large capacity. These classes are designed to be as generic and safe as possible.

So performance optimisations to try if you wish to stick with the standard Java HashMap:

  1. Make sure your hash function provides as close as possible to even distribution. Many domains have skewed values and your hash needs to take account of this.
  2. When you have a lot of data HashMap will be expanded many times. Ideally set initial capacity as close as possible to the final value.
  3. Make sure your equals implementation is as efficient as possible.

There are massive performance optimisations you can apply if you know (for example) that your key is an integer, for example using some form of btree after the hash has been applied and using == rather than equals.

So the simple answer is that I believe you will need to write your own collection to get the performance you want or use one of the more optimised implementations available.

like image 162
sprinter Avatar answered Oct 26 '22 23:10

sprinter


I guess your key.equals() is a time consuming method.

key.equals() will be called much more times in ImmutableMap.build() than HashMap.put() (in a loop). And key.hashCode() is called same times, both HashMap.put() and ImmutableMap.build(). As result, if key.equals() takes long time, the whole duration can be different much.

key.equals() are called a few times during HashMap.put() (Good hash algorithm leads a few collision). While in case of ImmutableMap.build(), key.equals() will be called many times when checkNoConflictInBucket(). O(n) for key.equals().

Once the map is built, two types of Map should not be different much when access, as both are hash-based.

Sample: There are 10000 random String to put as keys. HashMap.put() calls
String.equals() 2 times, while ImmutableMap.build() calls 3000 times.

like image 26
卢声远 Shengyuan Lu Avatar answered Oct 26 '22 22:10

卢声远 Shengyuan Lu