Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Generic HashCode implementation for classes

Tags:

c#

hashcode

I'm looking at how build the best HashCode for a class and I see some algorithms. I saw this one : Hash Code implementation, seems to be that .NET classes HashCode methods are similar (see by reflecting the code).

So question is, why don't create the above static class in order to build a HashCode automatically, just by passing fields we consider as a "key".

// Old version, see edit
public static class HashCodeBuilder
{
    public static int Hash(params object[] keys)
    {
        if (object.ReferenceEquals(keys, null))
        {
            return 0;
        }

        int num = 42;

        checked
        {
            for (int i = 0, length = keys.Length; i < length; i++)
            {
                num += 37;
                if (object.ReferenceEquals(keys[i], null))
                { }
                else if (keys[i].GetType().IsArray)
                {
                    foreach (var item in (IEnumerable)keys[i])
                    {
                        num += Hash(item);
                    }
                }
                else
                {
                    num += keys[i].GetHashCode();
                }
            }
        }

        return num;
    }
}

And use it as like this :

// Old version, see edit
public sealed class A : IEquatable<A>
{
    public A()
    { }

    public string Key1 { get; set; }
    public string Key2 { get; set; }
    public string Value { get; set; }

    public override bool Equals(object obj)
    {
        return this.Equals(obj as A);
    }

    public bool Equals(A other)
    {
        if(object.ReferenceEquals(other, null)) 
            ? false 
            : Key1 == other.Key1 && Key2 == other.Key2;
    }

    public override int GetHashCode()
    {
        return HashCodeBuilder.Hash(Key1, Key2);
    }
}

Will be much simpler that always is own method, no? I'm missing something?


EDIT

According all remarks, I got the following code :

public static class HashCodeBuilder
{
    public static int Hash(params object[] args)
    {
        if (args == null)
        {
            return 0;
        }

        int num = 42;

        unchecked
        {
            foreach(var item in args)
            {
                if (ReferenceEquals(item, null))
                { }
                else if (item.GetType().IsArray)
                {
                    foreach (var subItem in (IEnumerable)item)
                    {
                        num = num * 37 + Hash(subItem);
                    }
                }
                else
                {
                    num = num * 37 + item.GetHashCode();
                }
            }
        }

        return num;
    }
}


public sealed class A : IEquatable<A>
{
    public A()
    { }

    public string Key1 { get; set; }
    public string Key2 { get; set; }
    public string Value { get; set; }

    public override bool Equals(object obj)
    {
        return this.Equals(obj as A);
    }

    public bool Equals(A other)
    {
        if(ReferenceEquals(other, null))
        {
            return false;
        }
        else if(ReferenceEquals(this, other))
        {
            return true;
        }

        return Key1 == other.Key1
            && Key2 == other.Key2;
    }

    public override int GetHashCode()
    {
        return HashCodeBuilder.Hash(Key1, Key2);
    }
}
like image 719
Arnaud F. Avatar asked Mar 27 '11 16:03

Arnaud F.


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.

What is C full form?

Full form of C is “COMPILE”. One thing which was missing in C language was further added to C++ that is 'the concept of CLASSES'.


3 Answers

Your Equals method is broken - it's assuming that two objects with the same hash code are necessarily equal. That's simply not the case.

Your hash code method looked okay at a quick glance, but could actually do some with some work - see below. It means boxing any value type values and creating an array any time you call it, but other than that it's okay (as SLaks pointed out, there are some issues around the collection handling). You might want to consider writing some generic overloads which would avoid those performance penalties for common cases (1, 2, 3 or 4 arguments, perhaps). You might also want to use a foreach loop instead of a plain for loop, just to be idiomatic.

You could do the same sort of thing for equality, but it would be slightly harder and messier.

EDIT: For the hash code itself, you're only ever adding values. I suspect you were trying to do this sort of thing:

int hash = 17;
hash = hash * 31 + firstValue.GetHashCode();
hash = hash * 31 + secondValue.GetHashCode();
hash = hash * 31 + thirdValue.GetHashCode();
return hash;

But that multiplies the hash by 31, it doesn't add 31. Currently your hash code will always return the same for the same values, whether or not they're in the same order, which isn't ideal.

EDIT: It seems there's some confusion over what hash codes are used for. I suggest that anyone who isn't sure reads the documentation for Object.GetHashCode and then Eric Lippert's blog post about hashing and equality.

like image 183
Jon Skeet Avatar answered Sep 20 '22 14:09

Jon Skeet


This is what I'm using:

public static class ObjectExtensions
{
    /// <summary>
    /// Simplifies correctly calculating hash codes based upon
    /// Jon Skeet's answer here
    /// http://stackoverflow.com/a/263416
    /// </summary>
    /// <param name="obj"></param>
    /// <param name="memberThunks">Thunks that return all the members upon which
    /// the hash code should depend.</param>
    /// <returns></returns>
    public static int CalculateHashCode(this object obj, params Func<object>[] memberThunks)
    {
        // Overflow is okay; just wrap around
        unchecked
        {
            int hash = 5;
            foreach (var member in memberThunks)
                hash = hash * 29 + member().GetHashCode();
            return hash;
        }
    }
}

Example usage:

public class Exhibit
{
    public virtual Document Document { get; set; }
    public virtual ExhibitType ExhibitType { get; set; }

    #region System.Object
    public override bool Equals(object obj)
    {
        return Equals(obj as Exhibit);
    }

    public bool Equals(Exhibit other)
    {
        return other != null &&
            Document.Equals(other.Document) &&
            ExhibitType.Equals(other.ExhibitType);
    }

    public override int GetHashCode()
    {
        return this.CalculateHashCode(
            () => Document, 
            () => ExhibitType);
    }
    #endregion
}
like image 35
Carl G Avatar answered Sep 20 '22 14:09

Carl G


Instead of calling keys[i].GetType().IsArray, you should try to cast it to IEnumerable (using the as keyword).

You can fix the Equals method without repeating the field list by registering a static list of fields, like I do here using a collection of delegates.
This also avoids the array allocation per-call.

Note, however, that my code doesn't handle collection properties.

like image 35
SLaks Avatar answered Sep 21 '22 14:09

SLaks