Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different default 'initialCapacity' HashSet and LinkedHashSet

Tags:

java

When constructing a HashSetand a LinkedHashSet from a collection, the initialCapacity is set to different values in the default implementation.

HashSet:

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

LinkedHashSet:

public LinkedHashSet(Collection<? extends E> c) {
    super(Math.max(2*c.size(), 11), .75f, true);
    addAll(c);
}

I'm sure there is a perfectly valid reason for this, but I fail to see it.

like image 471
marthursson Avatar asked Jan 17 '17 08:01

marthursson


1 Answers

From the code you showed us, here are the specs for HashSet and LinkedHashSet:

data structure | initial capacity      | load factor
HashSet        | max(1.333 * size, 16) | 0.75
LinkedHashSet  | max(2 * size, 11)     | 0.75

Off the top of my head, it is probably more costly to rehash a LinkedHashSet than a plain HashSet, as the former has a linked list running through it, which might also have to be refactored/recalculated. By making the initial capacity greater, we might avoid exceeding the initial capacity for some typical use cases.

When the initial capacity of a hashtable data structure is exceeded in Java, it needs to be expanded. This requires, among other things, that every entry in the table needs to be rehashed to a new bucket. The cost of doing this should be roughly the same in both LinkedHashSet and plain HashSet. However, a LinkedHashSet has an additional requirement when expanding the capacity, because it maintains a linked list running through the entries. This list might also need to be refactored in the process. Hence, I would expect the cost of expanding capacity to be higher in LinkedHashSet than plain HashSet. By giving LinkedHashSet a greater initial capacity, we can avoid this costly expansion of capacity for a longer time.

LinkedHashSet Javadoc

like image 127
Tim Biegeleisen Avatar answered Oct 06 '22 08:10

Tim Biegeleisen