Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the lookup time complexity of HashSet<T>(IEqualityComparer<T>)?

In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity.

What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument:

http://msdn.microsoft.com/en-us/library/bb359100.aspx

In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1).

In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

Does the implementation internally construct a lookup table as elements are added to the collection?

In general, how might I ascertain information about complexity of .NET data structures?

like image 418
Kirby Avatar asked Mar 21 '12 20:03

Kirby


People also ask

What's the lookup time for a HashSet?

On average, the contains() of HashSet runs in O(1) time. Getting the object's bucket location is a constant time operation. Taking into account possible collisions, the lookup time may rise to log(n) because the internal bucket structure is a TreeMap.

What is the complexity of the lookup?

Lookup is O(1) but insert is amortized O(1). @Kirby That doesn't change. You could construct the HashSet from an IEnumerable or add the elements individually later: the only thing that might be different, which does not affect the [lookup] time complexity, is the capacity.

What is the space complexity of HashSet in Java?

It's both: O(n+k).

Is HashSet ordered in C#?

There's OrderedHashSet<T> collection: based on classic HashSet<T> source code (from . NET Core) preserves order of insertions and allows manual reordering.


1 Answers

A HashSet works via hashing (via IEqualityComparer.GetHashCode) the objects you insert and tosses the objects into buckets per the hash. The buckets themselves are stored in an array, hence the O(1) part.

For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. Inside that bucket is another array of buckets that divvy up by the second character in the hash. So on for every character in the hash....

Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating.

To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html

like image 69
Scott Stafford Avatar answered Oct 12 '22 14:10

Scott Stafford