What values should I pass to create an efficient <code>HashMap</code> / <code>HashMap</code> based structures for N items? In an <code>ArrayList</code>, the efficient number is N (N already assumes future grow). What should be the parameters for a <code>HashMap</code>? ((int)(N * 0.75d), 0.75d)? More? Less? What is the effect of changing the load factor?

I ran some unit tests to see if these answers were correct and it turned out that using: <pre class="prettyprint"><code>(int) Math.ceil(requiredCapacity / loadFactor); </code></pre> as the initial capacity gives what you want for either a <code>HashMap</code> or a <code>Hashtable</code>. By "what you want" I mean that adding <code>requiredCapacity</code> elements to the map won't cause the array which it's wrapping to resize and the array won't be larger than required. Since the default load capacity is 0.75, initializing a HashMap like so works: <pre class="prettyprint"><code>... = new HashMap<KeyType, ValueType>((int) Math.ceil(requiredCapacity / 0.75)); </code></pre> Since a HashSet is effectively just a wrapper for a HashMap, the same logic also applies there, i.e. you can construct a HashSet efficiently like this: <pre class="prettyprint"><code>.... = new HashSet<TypeToStore>((int) Math.ceil(requiredCapacity / 0.75)); </code></pre> @Yuval Adam's answer is correct for all cases except where <code>(requiredCapacity / 0.75)</code> is a power of 2, in which case it allocates too much memory. @NotEdible's answer uses too much memory in many cases, as the HashMap's constructor itself deals with the issues that it want the maps array to have a size which is a power of 2.

In the guava libraries from Google there is a function that creates a HashMap optimized for a expected number of items: newHashMapWithExpectedSize from the docs: <blockquote> Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth ... </blockquote>

The answer Yuval gave is only correct for Hashtable. HashMap uses power-of-two buckets, so for HashMap, Zarkonnen is actually correct. You can verify this from the source code: <pre class="prettyprint"><code> // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; </code></pre> So, although the load factor of 0.75f is still the same between Hashtable and HashMap, you should use an initial capacity n*2 where n is the number of elements you plan on storing in the HashMap. This will ensure the fastest get/put speeds.

It's safe in most cases of <code>List</code> and <code>Map</code> initialization to make the <code>List</code> or <code>Map</code> with the following size params. <pre class="prettyprint"><code>List<T>(numElements + (numElements / 2)); Map<T,T>(numElements + (numElements / 2)); </code></pre> this follows the .75 rule as well as saves a little overhead over the <code>* 2</code> operation described above.

<blockquote> In an ArrayList, the efficient number is N (N already assumes future grow). </blockquote> Erm, no it doesn't, unless I misunderstand what you're saying here. When you pass an integer into the Arraylist constructor, it will create an underlying array of exactly that size. If it turns out you need even a single extra element, the ArrayList will need to resize the underlying array when you next call add(), causing this call to take a lot longer than it usually would. If on the other hand you're talking about your value of N taking into account growth - then yes, if you can guarantee the value will never go above this then calling such an Arraylist constructor is appropriate. And in this case, as pointed out by Hank, the analogous constructor for a map would be N and 1.0f. This should perform reasonably even if you do happen to exceed N (though if you expect this to occur on a regular basis, you may wish to pass in a larger number for the initial size). The load factor, in case you weren't aware, is the point at which the map will have its capacity increased, as a fraction of the total capacity. Edit: Yuval is probably right that it's a better idea to leave the load factor around 0.75 for a general purpose map. A load factor of 1.0 would perform brilliantly if your keys had sequential hashcodes (such as sequential integer keys), but for anything else you will likely run into collisions with the hash buckets, meaning that lookups take longer for some elements. Creating more buckets than is strictly necessary will reduce this chance of collision, meaning there's more chance of elements being in their own buckets and thus being retrievable in the shortest amount of time. As the docs say, this is a time vs space tradeoff. If either is particularly important to you (as shown by a profiler rather than prematurely optimising!) you can emphasize that; otherwise, stick with the default.

HashMap initialization parameters (load / initialcapacity)

Tags:

java

collections

hashmap

What values should I pass to create an efficient HashMap / HashMap based structures for N items?

In an ArrayList, the efficient number is N (N already assumes future grow). What should be the parameters for a HashMap? ((int)(N * 0.75d), 0.75d)? More? Less? What is the effect of changing the load factor?

705

asked Jan 12 '09 10:01

Ran Biron

9 Answers

Regarding the load factor, I'll simply quote from the HashMap javadoc:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Meaning, the load factor should not be changed from .75 , unless you have some specific optimization you are going to do. Initial capacity is the only thing you want to change, and set it according to your N value - meaning (N / 0.75) + 1, or something in that area. This will ensure that the table will always be large enough and no rehashing will occur.

188

answered Oct 05 '22 23:10

Yuval Adam

I ran some unit tests to see if these answers were correct and it turned out that using:

(int) Math.ceil(requiredCapacity / loadFactor);

as the initial capacity gives what you want for either a HashMap or a Hashtable. By "what you want" I mean that adding requiredCapacity elements to the map won't cause the array which it's wrapping to resize and the array won't be larger than required. Since the default load capacity is 0.75, initializing a HashMap like so works:

... = new HashMap<KeyType, ValueType>((int) Math.ceil(requiredCapacity / 0.75));

Since a HashSet is effectively just a wrapper for a HashMap, the same logic also applies there, i.e. you can construct a HashSet efficiently like this:

.... = new HashSet<TypeToStore>((int) Math.ceil(requiredCapacity / 0.75));

@Yuval Adam's answer is correct for all cases except where (requiredCapacity / 0.75) is a power of 2, in which case it allocates too much memory.
@NotEdible's answer uses too much memory in many cases, as the HashMap's constructor itself deals with the issues that it want the maps array to have a size which is a power of 2.

answered Oct 05 '22 23:10

Mark Rhodes

In the guava libraries from Google there is a function that creates a HashMap optimized for a expected number of items: newHashMapWithExpectedSize

from the docs:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth ...

answered Oct 06 '22 00:10

linqu

It's also notable that having a HashMap on the small side makes hash collisions more likely, which can slow down lookup. Hence, if you really worry about the speed of the map, and less about its size, it might be worth making it a bit too large for the data it needs to hold. Since memory is cheap, I typically initialise HashMaps for a known number of items with

HashMap<Foo> myMap = new HashMap<Foo>(numberOfElements * 2);

Feel free to disagree, in fact I'd quite like to have this idea verified or thrown out.

answered Oct 06 '22 01:10

Zarkonnen

The answer Yuval gave is only correct for Hashtable. HashMap uses power-of-two buckets, so for HashMap, Zarkonnen is actually correct. You can verify this from the source code:

  // Find a power of 2 >= initialCapacity
  int capacity = 1;
  while (capacity < initialCapacity)
  capacity <<= 1;

So, although the load factor of 0.75f is still the same between Hashtable and HashMap, you should use an initial capacity n*2 where n is the number of elements you plan on storing in the HashMap. This will ensure the fastest get/put speeds.

answered Oct 05 '22 23:10

NotEdible

Referring to HashMap source code will help.

If the number of entries reaches threshold(capacity * load factor), rehashing is done automatically. That means too small load factor can incur frequent rehashing as entries grow.

answered Oct 05 '22 23:10

grayger

It's safe in most cases of List and Map initialization to make the List or Map with the following size params.

List<T>(numElements + (numElements / 2));
Map<T,T>(numElements + (numElements / 2));

this follows the .75 rule as well as saves a little overhead over the * 2 operation described above.

answered Oct 06 '22 00:10

lv2program

In an ArrayList, the efficient number is N (N already assumes future grow).

Erm, no it doesn't, unless I misunderstand what you're saying here. When you pass an integer into the Arraylist constructor, it will create an underlying array of exactly that size. If it turns out you need even a single extra element, the ArrayList will need to resize the underlying array when you next call add(), causing this call to take a lot longer than it usually would.

If on the other hand you're talking about your value of N taking into account growth - then yes, if you can guarantee the value will never go above this then calling such an Arraylist constructor is appropriate. And in this case, as pointed out by Hank, the analogous constructor for a map would be N and 1.0f. This should perform reasonably even if you do happen to exceed N (though if you expect this to occur on a regular basis, you may wish to pass in a larger number for the initial size).

The load factor, in case you weren't aware, is the point at which the map will have its capacity increased, as a fraction of the total capacity.

Edit: Yuval is probably right that it's a better idea to leave the load factor around 0.75 for a general purpose map. A load factor of 1.0 would perform brilliantly if your keys had sequential hashcodes (such as sequential integer keys), but for anything else you will likely run into collisions with the hash buckets, meaning that lookups take longer for some elements. Creating more buckets than is strictly necessary will reduce this chance of collision, meaning there's more chance of elements being in their own buckets and thus being retrievable in the shortest amount of time. As the docs say, this is a time vs space tradeoff. If either is particularly important to you (as shown by a profiler rather than prematurely optimising!) you can emphasize that; otherwise, stick with the default.

answered Oct 06 '22 00:10

Andrzej Doyle

For very large HashMaps in critical systems, where getting the initial capacity wrong can be very problematic, you may need empirical information to determine how best to initialize your Map.

CollectionSpy (collectionspy.com) is a new Java profiler which lets you see in the blink of an eye which HashMaps are close to needing rehashing, how many times they have been rehashed in the past, and more. An ideal tool to determine safe initial capacity arguments to capacity-based container constructors.

answered Oct 05 '22 23:10

Laurence Vanhelsuwe

Related questions
                            
                                Why Spring Boot 2.0 application does not run schema.sql?
                            
                                Can a directory be added to the class path at runtime?
                            
                                java.text.ParseException: Unparseable date "yyyy-MM-dd'T'HH:mm:ss.SSSZ" - SimpleDateFormat
                            
                                Invalid use of argument matchers
                            
                                Spring - Programmatically generate a set of beans
                            
                                Java - String replace exact word
                            
                                Usage of BufferedInputStream
                            
                                What is the difference between data types and literals in Java?
                            
                                How to set padding between columns of a JavaFX GridPane?
                            
                                Ant compile doesn't copy the resources
                            
                                Printing with "\t" (tabs) does not result in aligned columns
                            
                                How to convert Mono<List<String>> into Flux<String>
                            
                                Why is it not a good practice to synchronize on Boolean?
                            
                                Key existence checking utility in Map
                            
                                Is there a java hash structure with keys only and no values?
                            
                                Way to make Java parent class method return object of child class
                            
                                Can there exist two main methods in a Java program?
                            
                                JavaFX FileChooser: how to set file filters?
                            
                                How to get annotation class name, attribute values using reflection
                            
                                How to get the current logged in user object from spring security?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HashMap initialization parameters (load / initialcapacity)

Tags:

java

collections

hashmap

Ran Biron

People also ask

9 Answers

Yuval Adam

Mark Rhodes

linqu

Zarkonnen

NotEdible

grayger

lv2program

Andrzej Doyle

Laurence Vanhelsuwe

Recent Activity

Donate For Us