Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a HashSet offer constant time add operation?

I was reading the javadocs on HashSet when I came across the interesting statement:

This class offers constant time performance for the basic operations (add, remove, contains and size)

This confuses me greatly, as I don't understand how one could possibly get constant time, O(1), performance for a comparison operation. Here are my thoughts:

If this is true, then no matter how much data I'm dumping into my HashSet, I will be able to access any element in constant time. That is, if I put 1 element in my HashSet, it will take the same amount of time to find it as if I had a googolplex of elements.

However, this wouldn't be possible if I had a constant number of buckets, or a consistent hash function, since for any fixed number of buckets, the number of elements in that bucket will grow linearly (albeit slowly, if the number is big enough) with the number of elements in the set.

Then, the only way for this to work is to have a changing hash function every time you insert an element (or every few times). A simple hash function that never any collisions would satisfy this need. One toy example for strings could be: Take the ASCII value of the strings and concatenate them together (because adding could result in a conflict).

However, this hash function, and any other hash function of this sort will likely fail for large enough strings or numbers etc. The number of buckets that you can form is immediately limited by the amount of stack/heap space you have, etc. Thus, skipping locations in memory can't be allowed indefinitely, so you'll eventually have to fill in the gaps.

But if at some point there's a recalculation of the hash function, this can only be as fast as finding a polynomial which passes through N points, or O(nlogn).

Thus arrives my confusion. While I will believe that the HashSet can access elements in O(n/B) time, where B is the number of buckets it has decided to use, I don't see how a HashSet could possibly perform add or get functions in O(1) time.

Note: This post and this post both don't address the concerns I listed..

like image 550
Teofrostus Avatar asked May 28 '15 10:05

Teofrostus


People also ask

Is HashSet constant time?

hashset is implemented using a hash table. elements are not ordered. the add, remove, and contains methods has constant time complexity o(1).

What is the time complexity of HashSet?

It runs in O(1) expected time, as any hash table (assuming the hash function is decent). It is backed by a HashMap where the key is the Object.

Which method is used to add an item to a HashSet?

HashSet. add() method in Java HashSet is used to add a specific element into a HashSet. This method will add the element only if the specified element is not present in the HashSet else the function will return False if the element is already present in the HashSet.

Does HashSet allow duplicates?

Duplicates: HashSet doesn't allow duplicate values. HashMap stores key, value pairs and it does not allow duplicate keys. If the key is duplicate then the old key is replaced with the new value.


1 Answers

The number of buckets is dynamic, and is approximately ~2n, where n is the number of elements in the set.

Note that HashSet gives amortized and average time performance of O(1), not worst case. This means, we can suffer an O(n) operation from time to time.
So, when the bins are too packed up, we just create a new, bigger array, and copy the elements to it.
This costs n operations, and is done when number of elements in the set exceeds 2n/2=n, so it means, the average cost of this operation is bounded by n/n=1, which is a constant.

Additionally, the number of collisions a HashMap offers is also constant on average.

Assume you are adding an element x. The probability of h(x) to be filled up with one element is ~n/2n = 1/2. The probability of it being filled up with 3 elements, is ~(n/2n)^2 = 1/4 (for large values of n), and so on and so on.
This gives you an average running time of 1 + 1/2 + 1/4 + 1/8 + .... Since this sum converges to 2, it means this operation takes constant time on average.

like image 60
amit Avatar answered Sep 19 '22 12:09

amit