HashSet<T>.Add
first compares the results of GetHashCode
. If those are equal, it calls Equals
.
Now, my understanding is in order to implement GetHashCode
, something must be done with the fields of an object. A simple example implementation can be found at What is the best algorithm for an overridden System.Object.GetHashCode?.
In my test comparing both on 1.000.000 pairs of objects filled with random data, performance is more or less equal between the two. GetHashCode
is implemented as in the linked example, Equals
simply calls Equals
on all fields. So why would one want to use GetHashCode
over Equals
?
For some types, an Equals
test may be relatively expensive. It typically has to compare every field of the class. In other words, it takes linear time in the size of the class. Bigger classes are more expensive to compare for equality.
Now, what happens if you need to need to compare one object against 1000 others? Calling Equals
1000 times might get expensive. You need to do N*2000 field accesses, if N is the size of the class
GetHashCode
instead generates a "mostly unique" integer based on the contents of the class. In other words, the class fields are accessed once. And once you have that, you can compare this integer to the 1000 integers that make up the other objects' hash codes.
Even in such a naive use case, we now only need N*1000 field accesses.
But what if we store the hash code? When we insert an object into a hash set, its hash code is computed once. Now, any time we want to do a lookup in the hash set, we just need to compute one hash code (that of the external object), and then you just have to compare simple integers. So N class field accesses (for the new object whose hash code we need to compute), plus a number of integer comparisons, which vary, depending on the algorithm, but are 1) relatively few, and 2) cheap.
Because if an algorithm wants to test if 1 object is already in a set of 1.000.000 objects, it has to call Equals
1.000.000 times, but GetHashCode()
just once (and a few calls to Equals
to eliminate objects which are different though having the same hash code).
GetHashCode lets you put thing into buckets - multiple objects can have the same hash code. Equals is then used to find matches within a bucket. This let's you find things in large collections very quickly
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With