Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use GetHashCode() over Equals()?

HashSet<T>.Add first compares the results of GetHashCode. If those are equal, it calls Equals.

Now, my understanding is in order to implement GetHashCode, something must be done with the fields of an object. A simple example implementation can be found at What is the best algorithm for an overridden System.Object.GetHashCode?.

In my test comparing both on 1.000.000 pairs of objects filled with random data, performance is more or less equal between the two. GetHashCode is implemented as in the linked example, Equals simply calls Equals on all fields. So why would one want to use GetHashCode over Equals ?

like image 643
user247702 Avatar asked Jun 10 '11 10:06

user247702


3 Answers

For some types, an Equals test may be relatively expensive. It typically has to compare every field of the class. In other words, it takes linear time in the size of the class. Bigger classes are more expensive to compare for equality.

Now, what happens if you need to need to compare one object against 1000 others? Calling Equals 1000 times might get expensive. You need to do N*2000 field accesses, if N is the size of the class

GetHashCode instead generates a "mostly unique" integer based on the contents of the class. In other words, the class fields are accessed once. And once you have that, you can compare this integer to the 1000 integers that make up the other objects' hash codes.

Even in such a naive use case, we now only need N*1000 field accesses.

But what if we store the hash code? When we insert an object into a hash set, its hash code is computed once. Now, any time we want to do a lookup in the hash set, we just need to compute one hash code (that of the external object), and then you just have to compare simple integers. So N class field accesses (for the new object whose hash code we need to compute), plus a number of integer comparisons, which vary, depending on the algorithm, but are 1) relatively few, and 2) cheap.

like image 68
jalf Avatar answered Dec 07 '22 18:12

jalf


Because if an algorithm wants to test if 1 object is already in a set of 1.000.000 objects, it has to call Equals 1.000.000 times, but GetHashCode() just once (and a few calls to Equals to eliminate objects which are different though having the same hash code).

like image 30
Doc Brown Avatar answered Dec 07 '22 20:12

Doc Brown


GetHashCode lets you put thing into buckets - multiple objects can have the same hash code. Equals is then used to find matches within a bucket. This let's you find things in large collections very quickly

like image 40
Robert Levy Avatar answered Dec 07 '22 18:12

Robert Levy