I have been recently drilled in a couple of interviews about Hashtables and when is it neccessary to override the GetHashCode(). The discussion kept going deeper and deeper until I threw in the towel.
I am now doing some research to cover everything to be ready for next time.
I have found this excellent article that I would like to share: http://msdn.microsoft.com/en-us/library/ms379571(VS.80).aspx#datastructures20_2_topic5
1) Something I don't feel very comfortable with are the fact that Dictionaries are Hash based, but Lists are apparently not. Does that only mean that searching in a List<> and Array[] is linear, while searching in a dictionary or hashtable is constant and therefore much faster? Is this all to it?
2) If I use a class as a key in a dictionary, I need to override GetHashcode() on that class based on any required identification fields to make the instances unique. However it still could happen that both ID fields are equal and the same hashcode would be generated? If this is the case what happens during a collision of the two instances with the same hashcode?
3) How can the collision be resolved? I read in the article about rehashing methodology in case of collision for Hashtable and Chaining for the Dictionary. But I am still not sure how it works exactly as I am not a Math genius. :-\ Can anybody explain it better how it works?
Many Thanks, Kave
A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. By using a good hash function, hashing can work well.
If we have too small a hash table for the data set size and/or a bad hash function, elements can start to build in one index in the array. Theoretically, all n element could end up in the same linked list.
Applications Of Hash Tables: Hash tables are used as disk-based data structures and database indexing. They can be used to implement caches mainly used to that are used to speed up the access to data. Several dynamic programming languages like Python, JavaScript, and Ruby use hash tables to implement objects.
Hash tables tend to be faster when it comes to searching for items. In arrays, you have to loop over all items before you find what you are looking for while in a hash table you go directly to the location of the item. Inserting an item is also faster in Hash tables since you just hash the key and insert it.
1) In general, yes, a Dictionary<T>
or HashSet<T>
has constant time access. Locating an item in an unsorted List<T>
or Array must be done linearly. Sorted collections let you do binary searches, giving O(log n) access time.
2) If you override GetHashCode
in .NET, you should also override the Equals
method. In .NET Dictionary
and HashSet
, you can't insert items that are equal. Hash collisions are unavoidable in the general case (unless you have computed a perfect hash). There are several ways to resolve collisions.
3) For more information about collision resolution, see http://en.wikipedia.org/wiki/Hash_table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With