Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing the initial capacity of a HashSet with an expected number of unique values and insertions

Tags:

java

set

Ok, here's my situation:

I have an Array of States, which may contain duplicates. To get rid of the duplicates, I can add them all to a Set.

However when I create the Set, it wants the initial capacity and load factor to be defined, but what should they be set to?

From googling, I have come up with:

String[] allStates = getAllStates();
Set<String> uniqueStates = new HashSet<String>(allStates.length, 0.75);

The problem with this, is that allStates can contain anwhere between 1 and 5000 states. So the Set will have capacity of over 5000, but only containing at most 50.

So alternatively set the max size of the Set could be set to be the max number of states, and the load factor to be 1.

I guess my questions really are:

  • What should you set the initial capacity to be when you don't know how many items are to be in the Set?
  • Does it really matter what it gets set to when the most it could contain is 50?
  • Should I even be worrying about it?
like image 727
Clarkey Avatar asked Feb 19 '09 11:02

Clarkey


People also ask

What is the initial capacity of HashSet?

Constructs a new, empty set; the backing HashMap instance has default initial capacity (16) and load factor (0.75).

Does HashSet contain unique values?

HashSet not only stores unique Objects but also a unique Collection of Objects like ArrayList<E>, LinkedList<E>, Vector<E>,..etc.

How do I find the size of a HashSet?

HashSet. size() method is used to get the size of the HashSet or the number of elements present in the HashSet. Parameters: This method does not takes any parameter. Return Value: The method returns the size or the number of elements present in the HashSet.


2 Answers

Assuming that you know there won't be more than 50 states (do you mean US States?), the

Set<String> uniqueStates = new HashSet<String>(allStates.length, 0.75);

quoted is definitely wrong. I'd suggest you go for an initial capacity of 50 / 0.75 = 67, or perhaps 68 to be on the safe side.

I also feel the need to point out that you're probably overthinking this intensely. Resizing the arraylist twice from 16 up to 64 isn't going to give you a noticeable performance hit unless this is right in the most performance-critical part of the program.

So the best answer is probably to use:

new HashSet<String>();

That way, you won't come back a year later and puzzle over why you chose such strange constructor arguments.

like image 71
Zarkonnen Avatar answered Oct 02 '22 14:10

Zarkonnen


Use the constructor where you don't need to specify these values, then reasonable defaults are chosen.

like image 33
starblue Avatar answered Oct 02 '22 14:10

starblue