Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet<T> versus Dictionary<K, V> w.r.t searching time to find if an item exists

People also ask

What is the difference between HashSet and Dictionary?

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn't store key/value pairs; it only stores values. So, every objects should be unique and this is determined by the value returned from the GetHashCode method.

Why HashSet is faster than list?

The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.

Is a Dictionary a HashSet?

The real difference is the fact that with a Dictionary we can create key-value pairs (with the keys being unique), while with an HashSet we're storing an unordered set of unique items.

Which is better HashSet or list?

ArrayList maintains the insertion order i.e order of the object in which they are inserted. HashSet is an unordered collection and doesn't maintain any order. ArrayList allows duplicate values in its collection. On other hand duplicate elements are not allowed in Hashset.


HashSet vs List vs Dictionary performance test, taken from here.

Add 1000000 objects (without checking duplicates)

Contains check for half the objects of a collection of 10000

Remove half the objects of a collection of 10000


I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.


From MSDN documentation for Dictionary<TKey,TValue>

"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table."

With a note:

"The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

I know your question/post is old - but while looking for an answer to a similar question I stumbled across this.

Hope this helps. Scroll down to the Remarks section for more details. https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx


These are different data structures. Also there is no generic version of HashTable.

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.