I am wondering about the parameters for constructing a <code>ConcurrentHashMap</code>: <ul> <li> <code>initialCapacity</code> is 16 by default (understood).</li> <li> <code>loadFactor</code> is 0.75 by default.</li> <li> <code>concurrencyLevel</code> is 16 by default.</li> </ul> My questions are: <ul> <li>What criteria should be used to adjust <code>loadFactor</code> up or down?</li> <li>How do we establish the number of concurrently updating threads?</li> <li>What criteria should be used to adjust <code>concurrencyLevel</code> up or down?</li> </ul> Additionally: <ul> <li>What are the hallmarks of a good hashcode implementation? (If an SO question addresses this, just link to it.)</li> </ul> Thank you!

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default. Long answer: <ul> <li>load factor is the ratio between the number of "buckets" in the map and the number of expected elements;</li> <li> 0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure); <ul> <li>changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;</li> <li>in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performance a while ago in case you're interested)</li> </ul> </li> </ul> Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff. One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.

Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions. So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map. ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for. A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.

ConcurrentHashMap constructor parameters?

2 Answers

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.

Long answer:

load factor is the ratio between the number of "buckets" in the map and the number of expected elements;
0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure);
- changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;
- in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performance a while ago in case you're interested)

Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.

One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.

125

answered Oct 07 '22 13:10

Neil Coffey

Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions.

So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map.

ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for.

A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.

answered Oct 07 '22 11:10

Yishai

Related questions
                            
                                Best/quickest way to learn Java for a seasoned .NET/C# and C++ developer [closed]
                            
                                What happens when we refresh a web page?
                            
                                Distributed Computing Framework (.NET) - Specifically for CPU Intensive operations [closed]
                            
                                Registering a URL protocol handler in a multiple platforms
                            
                                Best design for generating code from an AST?
                            
                                Is this problem NP, and does it have a name?
                            
                                How are JavaScript files loaded and executed?
                            
                                How do Scala parser combinators compare to Haskell's Parsec? [closed]
                            
                                What custom themes are there for Django Admin? [closed]
                            
                                What should the Java main method be for a standalone application (for Spring JMS)?
                            
                                NoSQL vs. SQL when scalability is irrelevant
                            
                                How to limit speed with BMW JSDK on 116i programmatically from Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ConcurrentHashMap constructor parameters?

Tags:

non sequitor

People also ask

2 Answers

Neil Coffey

Yishai

Recent Activity

Donate For Us