In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity.
What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument:
http://msdn.microsoft.com/en-us/library/bb359100.aspx
In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1).
In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).
Does the implementation internally construct a lookup table as elements are added to the collection?
In general, how might I ascertain information about complexity of .NET data structures?
On average, the contains() of HashSet runs in O(1) time. Getting the object's bucket location is a constant time operation. Taking into account possible collisions, the lookup time may rise to log(n) because the internal bucket structure is a TreeMap.
Lookup is O(1) but insert is amortized O(1). @Kirby That doesn't change. You could construct the HashSet from an IEnumerable or add the elements individually later: the only thing that might be different, which does not affect the [lookup] time complexity, is the capacity.
It's both: O(n+k).
There's OrderedHashSet<T> collection: based on classic HashSet<T> source code (from . NET Core) preserves order of insertions and allows manual reordering.
A HashSet
works via hashing (via IEqualityComparer.GetHashCode
) the objects you insert and tosses the objects into buckets per the hash. The buckets themselves are stored in an array, hence the O(1) part.
For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. Inside that bucket is another array of buckets that divvy up by the second character in the hash. So on for every character in the hash....
Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating.
To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With