Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ConcurrentHashMap constructor parameters?

Tags:

I am wondering about the parameters for constructing a ConcurrentHashMap:

  • initialCapacity is 16 by default (understood).
  • loadFactor is 0.75 by default.
  • concurrencyLevel is 16 by default.

My questions are:

  • What criteria should be used to adjust loadFactor up or down?
  • How do we establish the number of concurrently updating threads?
  • What criteria should be used to adjust concurrencyLevel up or down?

Additionally:

  • What are the hallmarks of a good hashcode implementation? (If an SO question addresses this, just link to it.)

Thank you!

like image 743
non sequitor Avatar asked Oct 15 '09 17:10

non sequitor


People also ask

What is load factor in ConcurrentHashMap?

ConcurrentHashMap() Creates a new, empty map with a default initial capacity (16), load factor (0.75) and concurrencyLevel (16).

How many threads can ConcurrentHashMap write?

A ConcurrentHashMap has internal final class called Segment so we can say that ConcurrentHashMap is internally divided in segments of size 32, so at max 32 threads can work at a time.

How do you initialize ConcurrentHashMap?

A good approach can be having initialization like this: ConcurrentHashMap<String, Integer> instance = new ConcurrentHashMap<String, Integer>(16, 0.9f, 1); An initial capacity of 16 ensures a reasonably good number of elements before resizing happens.

Do I need to synchronize ConcurrentHashMap?

ConcurrentHashMap is thread-safe therefore multiple threads can operate on a single object without any problem. In ConcurrentHashMap, the Object is divided into a number of segments according to the concurrency level. By default, it allows 16 thread to read and write from the Map without any synchronization.


2 Answers

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.

Long answer:

  • load factor is the ratio between the number of "buckets" in the map and the number of expected elements;

  • 0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure);

    • changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;

    • in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performance a while ago in case you're interested)

Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.

One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.

like image 125
Neil Coffey Avatar answered Oct 07 '22 13:10

Neil Coffey


Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions.

So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map.

ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for.

A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.

like image 27
Yishai Avatar answered Oct 07 '22 11:10

Yishai