Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concise way to combine field hashcodes?

Tags:

java

c#

hash

One if the ways to implement GetHashCode - where it's required to do so - is outlined by Jon Skeet here. Repeating his code:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

Rolling this code by hand can be error-prone and bugs can be subtle/hard to spot (did you swap + and * by mistake?), it can be hard to remember the combination rules for different types, and I don't like expending mental effort on writing/reviewing the same thing over and over again for different fields and classes. It can also obfuscate one of the most important details (did I remember to include all the fields?) in repetitive noise.

Is there a concise way to combine field hashcodes using the .net library?. Obviously I could write my own, but if there's something idiomatic/built-in I'd prefer that.

As an example, in Java (using JDK7) I can achieve the above using:

   @Override
   public int hashCode()  
   {  
      return Objects.hash(field1, field2, field3);  
   }  

This really helps to eliminate bugs and focus in the important details.

Motivation: I came across a C# class which requires an overridden GetHashCode(), but the way it combined the hashcodes of its various constituents had some severe bugs. A library function for combining the hashcodes would be useful for avoiding such bugs.

like image 755
bacar Avatar asked Aug 05 '13 18:08

bacar


3 Answers

Some people use:

Tuple.Create(lastName, firstName, gender).GetHashCode()

It's mentioned on MSDN at Object.GetHashCode(), with the warning:

Note, though, that the performance overhead of instantiating a Tuple object may significantly impact the overall performance of an application that stores large numbers of objects in hash tables.

The logic of aggregating the constituent hashes is provided by System.Tuple, which hopefully has had some thought go into it...

Update: it is worth noting @Ryan's observation in the comments that this only appears to use the last 8 elements of any Tuple of Size>8.

like image 150
bacar Avatar answered Oct 03 '22 19:10

bacar


EDIT: System.HashCode has now been released. The recommended way of creating hashcodes is now this:

public override int GetHashCode()
{
    return HashCode.Combine(fieldA, fieldB, fieldC);
}

System.HashCode.Combine() will internally call .GetHashCode() on each field, and do the right thing automatically.

For very many fields (more than 8), you can create an instance of HashCode and then use the .Add() method:

public override int GetHashCode()
{
    HashCode hash = new HashCode();
    hash.Add(fieldA);
    hash.Add(fieldB);
    hash.Add(fieldC);
    hash.Add(fieldD);
    hash.Add(fieldE);
    hash.Add(fieldF);
    hash.Add(fieldG);
    hash.Add(fieldH);
    hash.Add(fieldI);
    return hash.ToHashCode();
}

Visual Studio 2019 now has a Quick Actions helper to generate Equals() and GetHashCode() for you. Simply right click the class name in the declaration > Quick Actions and Refactorings > Generate Equals and GetHashCode. Select the members you want it to use for equality, and also "Implement IEquatable", and then click OK.

One last thing: If you need to get the structural hash code of an object, for example if you want to include the hashcode of an array that changes based on its contents (aka structure) and not its reference, then you will need to cast the field to IStructuralEquatable and get its hash code manually, like so:

public override int GetHashCode()
{
    return HashCode.Combine(
        fieldA,
        ((IStructuralEquatable)stringArrayFieldB).GetHashCode(EqualityComparer<string>.Default));
}

This is because the IStructuralEquatable interface is almost always implemented explicitly, so casting to IStructuralEquatable is required to call IStructuralEquatable.GetHashCode() instead of the default object.GetHashCode() method.

Finally, in the current implementation the .GetHashCode of an int is simply the integer value itself, so passing in the hashcode value to HashCode.Combine() instead of the field itself makes no difference to the result.

Old Answer:

For the sake of completeness, here is the hashing algorithm taken from the .NET Tuple Reference source, line 52. Interestingly, this hash algorithm was copied over from System.Web.Util.HashCodeCombiner.

Here is the code:

public override int GetHashCode() {
    // hashing method taken from .NET Tuple reference
    // expand this out to however many items you need to hash
    return CombineHashCodes(this.item1.GetHashCode(), this.item2.GetHashCode(), this.item3.GetHashCode());
}

internal static int CombineHashCodes(int h1, int h2) {
    // this is where the magic happens
    return (((h1 << 5) + h1) ^ h2);
}

internal static int CombineHashCodes(int h1, int h2, int h3) {
    return CombineHashCodes(CombineHashCodes(h1, h2), h3);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4) {
    return CombineHashCodes(CombineHashCodes(h1, h2), CombineHashCodes(h3, h4));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), h5);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7, int h8) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7, h8));
}

Of course, the actual Tuple GetHashCode() (which is actually an Int32 IStructuralEquatable.GetHashCode(IEqualityComparer comparer)) has a large switch block to decide which one of these to call based upon how many items it is holding - your own code probably won't require that.

like image 26
Ryan Avatar answered Oct 03 '22 18:10

Ryan


It's not exactly the same, but we have a HashCodeHelper class in Noda Time (which has lots of types which override equality and hash code operations).

It's used like this (taken from ZonedDateTime):

public override int GetHashCode()
{
    int hash = HashCodeHelper.Initialize();
    hash = HashCodeHelper.Hash(hash, LocalInstant);
    hash = HashCodeHelper.Hash(hash, Offset);
    hash = HashCodeHelper.Hash(hash, Zone);
    return hash;
}

Note that it's a generic method, which avoids boxing for value types. It copes with null values automatically (using 0 for the value). Note that the MakeHash method has an unchecked block as Noda Time uses checked arithmetic as a project setting, whereas hash code calculations should be allowed to overflow.

like image 37
Jon Skeet Avatar answered Oct 03 '22 19:10

Jon Skeet