Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How well does .NET dictionary resolve collisions?

I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:

class Thingy
{
    public string Foo;
    public string Bar;
    public string Others;
}

and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:

public override bool Equals(object obj)
{
    Thingy thing = (Thingy)obj; // yes I do type check first
    return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}

public override int GetHashCode()
{
    return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}

so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.

My question is this: could I use a Dictionary<Thingy, int> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.

Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?

like image 527
Tesserex Avatar asked Feb 10 '10 20:02

Tesserex


People also ask

How does dictionary works internally in c#?

A dictionary, also called an associative array, is a collection of unique keys and a collection of values, where each key is associated with one value. Retrieving and adding values is very fast. Dictionaries take more memory because for each value there is also a key.

How does .NET dictionary work?

NET Dictionary implementation conceptually uses chaining as its collision resolution method, but it doesn't use a separate data structure (like a linked list) to store the items in the chain, it rather stores every entry in the same array. Now we add some more elements to the dictionary, then remove two of them.

Is C# dictionary A hash table?

@BrianJ: Both HashTable (class) and Dictionary (class) are hash tables (concept), but a HashTable is not a Dictionary , nor is a Dictionary a HashTable .

What data structure does dictionary use under the hood C#?

NET Dictionary (and hashmaps in general) worked under the hood.


2 Answers

Hash collisions only affect performance, not integrity.

A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.

like image 155
Bob Avatar answered Sep 19 '22 05:09

Bob


Hash collisions will primarily affect performance - not correctness. So long as Equals() behaves correctly.

Dictionary uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals() can correctly distinguish between instances, you should get correct results.

Where hash codes can result in problems is with mutable objects. If your Thingy class allows Foo or Bar to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.

like image 20
LBushkin Avatar answered Sep 21 '22 05:09

LBushkin