Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Object.GetHashCode() unique to a reference or a value?

Tags:

The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work.

  1. If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
  2. For the best performance, a hash function must generate a random distribution for all input.
  3. The hash function must return exactly the same value regardless of any changes that are made to the object.

Rules 1 & 3 are contradictory to me.

Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows.

like image 480
Dan Herbert Avatar asked Aug 29 '08 15:08

Dan Herbert


People also ask

Is string GetHashCode unique?

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code. The hash code itself is not guaranteed to be stable.

Is GetHashCode unique C#?

NO! A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.

What is GetHashCode used for?

A hash code is a numeric value which is used to insert and identify an object in a hash-based collection. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.


2 Answers

Rules 1 & 3 are contradictory to me.

To a certain extent, they are. The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. It is important that while objects are stored in a hash table, they retain their hash value.

To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. It is however sufficient to make only those fields immutable that determine the hash value.

Consider the following example:

struct Person {     public readonly string FirstName;     public readonly string Name;     public readonly DateTime Birthday;      public int ShoeSize; } 

People rarely change their birthday and most people never change their name (except when marrying). However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:

public int GetHashCode() {     return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode(); } 
like image 51
Konrad Rudolph Avatar answered Oct 05 '22 10:10

Konrad Rudolph


Not sure what MSDN documentation you are referring to. Looking at the current documentation on Object.GetHashCode (http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) provides the following "rules":

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  • For the best performance, a hash function must generate a random distribution for all input.

If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application".

Also from the documentation,

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and must use at least one of the instance fields as input. [Emphasis added is mine.]

As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value.

like image 23
Scott Dorman Avatar answered Oct 05 '22 09:10

Scott Dorman