Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ConcurrentHashMap cannot have a lock for each bucket?

As we know, java's ConcurrentHashMap has a number of internal locks, and each of them guards some region of the bucket's array.

A question is: why cannot we create a lock for each bucket?

A similair question was already asked: Disadvantage of increasing number of partition in Java ConcurrentHashMap?

According to the answer, there are several reasons:

  1. the maximum number of threads running simultaneously is limited by the number of cores of the processor. Is this correct? Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap?

  2. There is a waste of L2 cache. Why?

  3. There is a waste of memory. Looks like this is because of additional lock creation.

Is there any more reasons?

like image 443
MiamiBeach Avatar asked Aug 20 '14 17:08

MiamiBeach


People also ask

Will the lock be acquired for all the buckets in ConcurrentHashMap?

Lock-free: The ConcurrentHashMap is designed to be lock-free, which means that there is no need to acquire a lock in order to read or write data.

Does ConcurrentHashMap use locks?

In ConcurrentHashMap, at a time any number of threads can perform retrieval operation but for updated in the object, the thread must lock the particular segment in which the thread wants to operate. This type of locking mechanism is known as Segment locking or bucket locking.

In which part of ConcurrentHashMap the lock is implemented?

ConcurrentHashMap: It allows concurrent access to the map. Part of the map called Segment (internal data structure) is only getting locked while adding or updating the map.

How many threads can work on ConcurrentHashMap?

The default size is 16, meaning max 16 threads can work at a time. Each thread can work on each segment during high concurrency and at most 16 threads can operate at max which simply maintains 16locks to guard each bucket of the ConcurrentHashMap.


2 Answers

Hopefully I do a decent job of explaining... kind of rushed at the moment...

The answer to your first question:

"why cannot we create a lock for each bucket?"

Is that you can create a lock for each bucket - it just isn't necessarily the best course of action.

The answer to your question:

"Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap"

is technically "No", though it depends on what you mean by "need". Having a number of regions that matches your system's maximum concurrency or is slightly greater does not necessarily prevent contention, but in practice it works pretty well. There's nothing stopping two threads from attempting to access the same region at the same time, even if there are other regions that aren't locked.

What you can guarantee by having 8 regions or more on an 8-core processor is that all regions can be accessed simultaneously without contention. If you have 8 cores (not Hyper Threaded) you can perform at most 8 operations at the same time. Even then the ideal number of regions might be more (say, 16) than the number of cores because it will make contention less likely at a low cost (only 8 additional locks).

The benefit from having additional regions eventually diminishes as the number of regions increases relative to your maximum concurrency, which leads to them being a waste of space (memory), as mentioned in the JavaDoc. It's a balance between likelihood of contention (given a lock on one region, what is the probability another thread will attempt to access it) and wasted space.

There are a couple of other factors that will affect performance of a ConcurrentHashMap:

  • Execution time of locked code - it's good practice to make locked code sections small so that they complete quickly and release their locks. The more quickly locks are released, the more quickly contention is resolved.
  • Distribution of data - Nicely-distributed data tends to perform better under high concurrency. Having all of your data clustered within a single region means that you will always encounter contention.
  • Data access pattern - Accessing different regions of data at the same time will perform better, as your threads won't be contending for resource locks. Having nicely-distributed data doesn't matter if you only attempt to access one region at a time.

No matter how many regions there are, all three of those things can positively or negatively affect performance, and can make the number of regions less relevant. Since they play a big part, they make it less likely that having significantly more regions will help you in general. Since you can only execute so many threads at the same time, having threads that quickly complete their work and release their locks is a better focus.

As to your question about the cache: I'm honestly not sure, but I can take a guess. When you're using the map heavily those locks will end up on the cache and take up space, potentially bumping out other things which could be more useful. Cache is much more scarce than main memory, and cache misses waste a lot of time. I think the idea here is a general aversion to putting lots of things on the cache that don't offer a significant benefit. Taken to the extreme: if the cache is filled with locks (somehow) and every data call goes out to memory, you are taking a performance hit.

like image 154
Patrick Garrity Avatar answered Sep 21 '22 10:09

Patrick Garrity


Can we ALWAYS state that if we have 8-core processor we do not need more than 8 locked regions in ConcurrentHashMap?

No, this is completely wrong. It depends on two factors, the number of threads (concurrency) and the number of segment collisions. If two threads compete for the same segment, one thread might block the other.

While you can have only as much threads owning a core as you have cores, the big mistake with the above statement is to assume that a thread not running on a core can’t own a lock. But a thread owning a lock can still loose the CPU on a task switch for the next thread which then gets blocked when trying to acquire the same lock.

But it’s not unusual to adjust the number of threads to the number of cores, especially for computational intense tasks. So the concurrency level of a ConcurrentHashMap depends indirectly on the number of cores in typical setups.


Having a lock for each bucket would imply maintaining a lock state and a waiting queue for each bucket which means quite a lot of resources. Keep in mind that the lock is only required for concurrent write operations but not for the reading threads.

However, for the Java 8 implementation this consideration is obsolete. It uses a wait-free algorithm for bucket updates, at least for buckets without collisions. This is a bit like having a lock per bucket as threads operating on different buckets do not interfere with each other but without the overhead of maintaining a lock state & wait queue. The only thing to care about is to give the map an appropriate initial size. Consequently, the concurrencyLevel, if specified, is used as an initial sizing hint, but otherwise ignored.

like image 22
Holger Avatar answered Sep 18 '22 10:09

Holger