Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the process of hashing work in Dictionary<TKey, TValue>

Tags:

c#

.net

How does the process of hashing work in Dictionary? I read that using dictionary provides faster look up. But did not understand how? How does the hashing and mapping to an index happen? Couldn't find any good reference.

EDIT: How is the actual memory location where the object is stored obtained from the result of the hashing function?

like image 654
softwarematter Avatar asked Sep 10 '09 20:09

softwarematter


People also ask

How does C# dictionary work?

A dictionary, also called an associative array, is a collection of unique keys and a collection of values, where each key is associated with one value. Retrieving and adding values is very fast. Dictionaries take more memory because for each value there is also a key.

Is dictionary a type of hash table?

Hashtable and Dictionary are collection of data structures to hold data as key-value pairs. Dictionary is generic type, hash table is not a generic type. The Hashtable is a weakly typed data structure, so you can add keys and values of any Object Type to the Hashtable.

What is a hash table dictionary?

A Hashtable is a collection of key/value pairs that are arranged based on the hash code of the key. Or in other words, a Hashtable is used to create a collection which uses a hash table for storage. It is the non-generic type of collection which is defined in System. Collections namespace.

Does dictionary use Hashcode?

-1 Dictionary does NOT use GetHashCode() to determine if two keys are equal. That is, a dictionary can contain separate entries whose keys have the same hash code. The dictionary may be less efficient, but it will still work.


2 Answers

A hash table or dictionary is a data structure that stores key-value pairs. The advantage of the hash table is that given a key finding the corresponding value is pretty fast. Simplified, the time to find a key-value pair in the hash table does not depend on the size of the table. Compare that to storing the key-value pairs in a list or an array. To find a key-value pair you would have to search the list from the beginning until a matching key was found. The longer the list the more time it would take to find the key-value pair. Using big-O notation you can say that looking up a key in a hash table is of order O(1) while looking up a key in a list by using linear search is of order O(N) (simplified).

To insert a key-value pair in the hash table you will first have to compute the hash code of the key. In .NET all objects have a method named GetHashCode that returns a hash code (32 bit integer) for that particular object. It is important that equal objects return the same hash code, but also very useful if different objects return different hash codes. Beware of the misconception that different objects cannot return the same hash code - they can, but it will result in a collision (see below).

As an example consider the hash codes of two strings:

"Boo" 0x598FD95A
"Foo" 0x598FD8DE

Even though the strings are very similar they have different hash codes.

I am simplifying things a bit here to focus on the important aspects of a hash table so for now let us say that Internally Dictionary<TKey, TValue> stores the key-value pairs in an array. To locate the index in this array where the key-value pair will be stored you have to compute the hash code of the key modulo the size of the array. Assume the size of the array is 5:

Index("Boo") = 0x598FD95A % 5 = 4
Index("Foo") = 0x598FD8DE % 5 = 0

This leads to this internal hash table array:

+---+---------+
| 0 | "Foo"   |
+---+---------+
| 1 | (empty) |
+---+---------+
| 2 | (empty) |
+---+---------+
| 3 | (empty) |
+---+---------+
| 4 | "Boo"   |
+---+---------+

Looking up an entry in the hash table is very fast. You simply have to compute the hash code of the key modulo the size of the internal array and retrieve the string at that index.

Now consider the key "Zoo":

Index("Zoo") = 0x598FDC62 % 5 = 0

It has the same index as the key "Foo". This results in what is called a collision. A proper implementation of a hash table will have to handle collisions and there are different strategies for doing that. Also, as the internal array fills up there will be fewer and fewer empty elements in the array resulting in an increasing number of collisions. The load factor is the ratio between used elements and total elements in the internal array. In the example above the load factor is 2/5 = 0.4. Most hash table implementations will increase the size of the internal array when the load factor exceeds a certain threshold.

If you want to learn more about some of these concepts you will have to study some of the more comprehensive resources linked in other answers.

like image 93
Martin Liversage Avatar answered Sep 19 '22 05:09

Martin Liversage


The hashing process in an Dictionary uses a technique that's refered to as chaining. With chaining, a secondary data structure is utilized to hold any collisions. Specifically, each slot in the Dictionary has an array of elements that map to a bucket. In the event of a collision, the colliding element is prepended to the bucket's list.

See this article on MSDN for more details.

like image 40
Mez Avatar answered Sep 21 '22 05:09

Mez