Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashtable collision rehashing - how are values read?

Tags:

c#

hashtable

I am trying to understand how Hashtables work in C#. I read the MSDN article and I understand that C# Hashtables use 'rehashing' for collisions, i.e. if I try to insert a key/value pair into the hashtable, if using HashFunction H1 results in a collision, then it will try HashFunction H2, H3, etc, until no collisions are found.

MSDN quote:

The Hashtable class uses a different technique referred to as rehasing. (Some sources refer to rehashing as double hashing.)

Rehashing works as follows: there is a set of hash different functions, H1 ... Hn, and when inserting or retrieving an item from the hash table, initially the H1 hash function is used. If this leads to a collision, H2 is tried instead, and onwards up to Hn if needed. The previous section showed only one hash function, which is the initial hash function (H1). The other hash functions are very similar to this function, only differentiating by a multiplicative factor. In general, the hash function Hk is defined as:

Hk(key) = [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize

However, taking the example from the MSDN site1:

private static Hashtable employees = new Hashtable();

public static void Main()
{
    // Add some values to the Hashtable, indexed by a string key
    employees.Add("111-22-3333", "Scott");
    employees.Add("222-33-4444", "Sam");
}

Let's assume that adding the second key will result in a collision, so H2 will have to be used. However, when I call employees["222-33-4444"], how does the hashtable know to use H2? Is there a separate mapping? Thanks.

like image 886
user981225 Avatar asked Feb 06 '12 20:02

user981225


Video Answer


1 Answers

Hash tables store both the key and the value in the hash table itself. This way later on during operations such as hash table look-ups it can be guaranteed that the value found is the one that matches the index used for the look-up. Hash tables use a simple "try the basic method of look-up until success" methodology. In this case, the method of look-up is "use hash function X" where X changes on failure.

In other schemes, the method of look-up is "look at the table entry X" (as determined by a hash function) where X just increases by one in a wrapping manner each failure.

The nagging question now is what happens when the value ISN'T in the table? Well, that can be rather ugly: When you've either hit an entry in the table which is missing or, even worse, when you've iterated through as many entries as are stored in the table, you can be sure the entry isn't there -- but that can take "a while" in the worst case.

Keep in mind that since only one value can be associated with one key, once you've found the key, you've found the value. The worst a hash table can do is having to do the equivalent of a cache-unfriendly linear search over all the values in the hash table itself... but ultimately, it will find the value if it's there because it's comparing the stored key to the requested key to test if it's there. The only optimization closed hash tables make is where to look first -- in this case, where hash function 1 says, and then 2, and then 3...

like image 104
Kaganar Avatar answered Sep 17 '22 21:09

Kaganar