Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why HashMap initial capacity is not properly handled by the library?

To create HashMap/HashSet for N elements, we generally donew HashMap((int)(N/0.75F)+1) which is annoying.

Why the library has not taken care of this in the first place and allows initialization like new HashMap(N)(should not rehash till N elements) taking care of this calculation (int)(N/0.75F)+1?

Update 09 Nov 22:

Java 19 introduced HashMap<K,V> newHashMap(int numMappings)

Javadoc:

Creates a new, empty HashMap suitable for the expected number of mappings. The returned map uses the default load factor of 0.75, and its initial capacity is generally large enough so that the expected number of mappings can be added without resizing the map.

Similar methods were introduced in other Map classes too.

like image 726
Venkata Raju Avatar asked Sep 03 '25 05:09

Venkata Raju


2 Answers

Update

Updating to reflect changed question. No, there is no such standard API but it seems there is a method Maps.newHashMapWithExpectedSize(int) in guava:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth.


i have to initialize it to (int)(N/0.75F)+1

No you don't. If you create new HashMap from other Map, HashMap calculates capacity first by default:

public HashMap(Map<? extends K, ? extends V> m) {
    this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                  DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
    putAllForCreate(m);
}

If you add elements one by one, the same process happens as well:

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        //...
    }

    createEntry(hash, key, value, bucketIndex);
}

The only reason to use HashMap(int initialCapacity, float loadFactor) constructor is when you know from the very beginning how many elements you want to store in the HashMap, thus avoiding resizing and rehashing later (map has correct size from the very beginning).

One interesting implementation detail is that initial capacity is trimmed to the nearest power of two (see: Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?):

// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
    capacity <<= 1;

So if you want your HashMap to have exact capacity as defined, just use powers of two.

Choosing different loadFactor allows you to trade space for performance - smaller value means more memory, but less collisions.

like image 109
Tomasz Nurkiewicz Avatar answered Sep 04 '25 18:09

Tomasz Nurkiewicz


I have run the following program

public static void main(String... args) throws IllegalAccessException, NoSuchFieldException {
    for (int i = 12; i < 80; i++) {
        Map<Integer, Integer> map = new HashMap<Integer, Integer>((int) Math.ceil(i / 0.75));
        int beforeAdding = Array.getLength(getField(map, "table"));
        for (int j = 0; j < i; j++) map.put(j, j);
        int afterAdding = Array.getLength(getField(map, "table"));
        map.put(i, i);
        int oneMore = Array.getLength(getField(map, "table"));
        System.out.printf("%,d: initial %,d, after N %,d, after N+1 %,d%n ",
                i, beforeAdding, afterAdding, oneMore);
    }
}

private static <T> T getField(Map<Integer, Integer> map, String fieldName) throws NoSuchFieldException, IllegalAccessException {
    Field table = map.getClass().getDeclaredField(fieldName);
    table.setAccessible(true);
    return (T) table.get(map);
}

which prints out

 12: initial 16, after N 16, after N+1 32
 13: initial 32, after N 32, after N+1 32
 .. deleted ..
 24: initial 32, after N 32, after N+1 64
 25: initial 64, after N 64, after N+1 64
 .. deleted ..
 47: initial 64, after N 64, after N+1 64
 48: initial 64, after N 64, after N+1 128
 49: initial 128, after N 128, after N+1 128
 .. deleted ..
 79: initial 128, after N 128, after N+1 128

This shows that the default initialiser the initial capacity is rounded to the next power of two. The problem with this value is that if you want this to be the eventual size, you have to take into account the load factor if you want to avoid resizing. Ideally you shouldn't have to, in the way the Map copy constructor does for you.

like image 35
Peter Lawrey Avatar answered Sep 04 '25 19:09

Peter Lawrey