Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance ConcurrentHashmap vs HashMap

How is the performance of ConcurrentHashMap compared to HashMap, especially .get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?

Is there any reason not to use ConcurrentHashMap instead of HashMap?

(I know that null values aren't allowed)

Update

just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?

like image 887
Mauli Avatar asked Sep 04 '09 10:09

Mauli


People also ask

Is HashMap faster than ConcurrentHashMap?

If you choose a single thread access use HashMap , it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap , but not much. When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread.

Is ConcurrentHashMap slower than HashMap?

Only modifying operations on ConcurrentHashMap are synchronized. Hence, add or remove operations on ConcurrentHashMap are slower than on HashMap . The read operations on both, ConcurrentHashMap and HashMap , give same performance as read operations on both maps are not synchronized.

Why is ConcurrentHashMap faster?

ConcurrentHashMap divides the whole map into different segments and locks only a particular segment during the update operation, instead of Hashtable, which locks whole Map. The ConcurrentHashMap also provides lock-free read, which is not possible in Hashtable.

Why is ConcurrentHashMap faster than Hashtable?

ConcurrentHashMap uses multiple buckets to store data. This avoids read locks and greatly improves performance over a HashTable .


2 Answers

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeter I have created tests of add, get and remove for both HashMap and ConcurrentHashMap in two scenarios:

  1. using single thread
  2. using as many threads as I have cores available. Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap.

Code is available on my repo.

The results are as follows:

  • X axis (size) presents number of elements written to the map(s)
  • Y axis (value) presents time in milliseconds

Add method Get method Remove method

The summary

  • If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.

  • If you choose a single thread access use HashMap, it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap, but not much.

  • When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread. So there is no need to partition your data in different structures.

To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.

Testing platform

AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8

like image 82
Atais Avatar answered Oct 07 '22 19:10

Atais


Thread safety is a complex question. If you want to make an object thread safe, do it consciously, and document that choice. People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version. Thread safety, while really nice, is not just for Christmas!

So now to your question:

ConcurrentHashMap (at least in Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).

Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.

As for deciding which implementation to use, the choice is probably simple.

If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer. Your class has different thread safety expectations from the instances of that class.

If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread. By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.

If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.

If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications. In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.

In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.

like image 41
Bill Michell Avatar answered Oct 07 '22 19:10

Bill Michell