Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disadvantage of increasing number of partition in Java ConcurrentHashMap?

Java ConcurrentHashMap maintains partitions internally. Each partition can have locking separately. There can be scenarios where all the keys accessed by multiple thread fall into the same partition and partitions may not be helpful. Increasing the number of partitions further should improve the concurrency.

Why does Java provides default value for partition count as 16 instead of very high value? What is the performance overhear with large number of partitions in the Map?

like image 567
saurabh Avatar asked Jun 21 '13 11:06

saurabh


2 Answers

Why does Java provides default value for partition count as 16 instead of very high value?

It is very rare to have these many CPUs (The number of threads is not so important) using the same CHM at a the same time. If you really need this, there is usually a better way to write your application which avoids this.

For example say you have 1000 threads but only 8 CPUs. This means only 8 threads at most will be running and accessing the CHM, assuming your program doesn't do anything useful e.g. anything else.

In real programs, it is rare for one collection to be used more than 10% of time. This is because there is usually some IO involved, or it makes sense to restructure threads to use there own copies of collections and collect them together at the end e.g. Map-Reduce

What is the performance overhear with large number of partitions in the Map?

You waste a bit of memory which doesn't matter, but mostly you waste some L1 cache which is limited to 32 KB and a relatively precious resources.

like image 141
Peter Lawrey Avatar answered Oct 20 '22 01:10

Peter Lawrey


Here's what the javadoc says (Java 6):

"The allowed concurrency among update operations is guided by the optional concurrencyLevel constructor argument (default 16), which is used as a hint for internal sizing. The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention. But overestimates and underestimates within an order of magnitude do not usually have much noticeable impact. A value of one is appropriate when it is known that only one thread will modify and all others will only read. Also, resizing this or any other kind of hash table is a relatively slow operation, so, when possible, it is a good idea to provide estimates of expected table sizes in constructors."

So the short answer is that the default value (16) is a compromise between limiting concurrency and wasting space. A "very high" value would waste a lot of space. (And as Peter Lawrey notes, that can lead to reduced performance due to memory cache effects.)

The other thing to note is that the LinkedHashMap implementation silently caps the value of concurrencyLevel at 216. (At least, that's what the Java 6 code does.) It is hard to envisage a real-world scenario where you'd need that much concurrency though.

like image 40
Stephen C Avatar answered Oct 19 '22 23:10

Stephen C