I have a problem with a custom object that needs to be keyed for a table. I need to generate a unique numeric key. I'm having collision problems and I'm wondering if I can leverage a dictionary to help me. Assume I have an object like this:
class Thingy
{
public string Foo;
public string Bar;
public string Others;
}
and so on with more fields. Lets say Foo and Bar are my key fields - if they're equal between two Thingys, then the two objects should be considered equal (one may represent an update to the other, with Others fields being updated.) So I have these:
public override bool Equals(object obj)
{
Thingy thing = (Thingy)obj; // yes I do type check first
return (this.Foo == thing.Foo && this.Bar == thing.Bar);
}
public override int GetHashCode()
{
return (this.Foo + this.Bar).GetHashCode(); // using default string impl
}
so this works for the most part, but there are rare occasions where two Thingys that are actually different have the same hash code.
My question is this: could I use a Dictionary<Thingy, int
> where I put in my Thingys, and use a sequential value coming out of the dictionary as my actual key? I'm wondering if the Dictionary, when detecting a rare hash code collision, will call my Equals method, determine that the objects are actually different, and store them differently. I imaging then when looking it up, it would see a bucket for that hash and search for the correct Thingy, again using Equals for comparison.
Is this the case with dictionary, or does it only resolve collisions where the hash code is different, but (hash % size) is the same? If this won't work, what might?
A dictionary, also called an associative array, is a collection of unique keys and a collection of values, where each key is associated with one value. Retrieving and adding values is very fast. Dictionaries take more memory because for each value there is also a key.
NET Dictionary implementation conceptually uses chaining as its collision resolution method, but it doesn't use a separate data structure (like a linked list) to store the items in the chain, it rather stores every entry in the same array. Now we add some more elements to the dictionary, then remove two of them.
@BrianJ: Both HashTable (class) and Dictionary (class) are hash tables (concept), but a HashTable is not a Dictionary , nor is a Dictionary a HashTable .
NET Dictionary (and hashmaps in general) worked under the hood.
Hash collisions only affect performance, not integrity.
A simple test would be to change GetHashCode() to simply return 1;. You'll note that the dictionary still behaves properly, but with any reasonable dataset, it will perform terribly.
Hash collisions will primarily affect performance - not correctness. So long as Equals()
behaves correctly.
Dictionary
uses the hash code as a way to organize items into separate "buckets". If too many items share the same hash code, you can run into performance problems. However, as long as Equals()
can correctly distinguish between instances, you should get correct results.
Where hash codes can result in problems is with mutable objects. If your Thingy
class allows Foo
or Bar
to change for an item in the dictionary, you may then fail to find it in a subsequent access attempt. This is because the hash code produced now differs from the one used to store the value in the dictionary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With