Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need some explanation on how HashSet works

Tags:

c#

collections

That is what I have:

class A
{
  // The uniqueness of instance must be detected by this property
  public string Key { get; set; }

  // There are other properties
}

var set = HashSet<A>()

My general purposes are:

  • to provide identity of instances in the set collection by Key property value

  • to make this collection work as fast as possible for Contain operation

Answer to the following questions might help me to meet this purposes:

  1. What is used to run methods like Contains and Add which have to determine uniqueness of the instance: GetHashCode() or IEquatable? Most probably GetHashCode() as HashSet declared to be very fast for search.
  2. Default String.GetHashCode() implementation does not guarantee uniqueness of hash for 2 different string, so how can I provide uniqueness with performance in mind?
  3. Is IEquatable is used at all by HashSet?

Note his collection is created and destroyed in run-time only, it is not saved into Database

like image 296
YMC Avatar asked Sep 15 '25 23:09

YMC


1 Answers

Collections usually use Object.GetHashCode() and Object.Equals() to obtain hash codes and check for equality. There is no way to make Object.GetHashCode() return a unique hash code for all but the simplest objects - the hash code is only 32 bit wide and every object with more than 32 bits of internal state can not be mapped to unique hash codes. Therefore Object.Equals() is used to check for exact equality in case of hash code collisions.

In consequence you have to override both mentioned methods with a suitable implementation.

public override Int32 GetHashCode()
{
    // If this.Key may be null you have to handle this case.
    return this.Key.HashCode();
}

public override Boolean Equals(Object obj)
{
    var other = obj as A;

    return (other != null) && (this.Key == other.Key);
}

Alternatively you can use the HashSet<T> constructor accepting IEqualityComparer<T> and externalize both methods, for example if you have no control over the source code of the types you want to add to a set. Just create a class implementing the interface with suitable methods and pass an instance of this class to the HashSet<T> constructor.

like image 63
Daniel Brückner Avatar answered Sep 17 '25 13:09

Daniel Brückner