Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way for calculating hashcode of a class with string properties? [duplicate]

Tags:

c#

gethashcode

I have a class with string properties and I need to override GetHashCode() method.

class A
{
    public string Prop1 { get; set; }
    public string Prop2 { get; set; }
    public string Prop3 { get; set; }
}

The first idea is to do something like this:

public override int GetHashCode()
{
    return Prop1.GetHashCode() ^ Prop2.GetHashCode() ^ Prop3.GetHashCode();
}

The second idea is:

public override int GetHashCode()
{
    return String.Join(";", new[] {Prop1, Prop2, Prop3}).GetHashCode();
}

What is the best way?

like image 743
Warlock Avatar asked Nov 28 '12 05:11

Warlock


People also ask

How is Hashcode calculated?

A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code.

Is hashCode always same?

No, the value can change between computers and base system versions. You should only depend on it to be constant during a given program run.

Is string GetHashCode unique?

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code. The hash code itself is not guaranteed to be stable.


2 Answers

You shouldn't just XOR them together, because this doesn't account for ordering. Imagine you have two objects:

"foo", "bar", "baz"

and

"bar", "foo", "baz"

With a simple XOR, both of these will have the same hash. Luckily it's pretty easy to work around. This is the code I use to combine hashes:

static int MultiHash(IEnumerable<object> items)
{
    Contract.Requires(items != null);

    int h = 0;

    foreach (object item in items)
    {
         h = Combine(h, item != null ? item.GetHashCode() : 0);
    }

    return h;
}

static int Combine(int x, int y)
{
    unchecked
    {
         // This isn't a particularly strong way to combine hashes, but it's
         // cheap, respects ordering, and should work for the majority of cases.
         return (x << 5) + 3 + x ^ y;
    }
}

There are a lot of ways to combine hashes, but usually something very simple like this will do. If for some reason it doesn't work for your situation, MurmurHash has pretty robust hash combining you can pull.

like image 80
Cory Nelson Avatar answered Sep 29 '22 08:09

Cory Nelson


Just XOR the hashes of each string together. It is cheaper (performance wise) than the string concatenation, and as far as I can see, it is not more prone to collisions. Let's assume that each string is 5 characters long and that each character takes up 1 byte. In the first one, you are hashing 15 bytes to 4 bytes (int). In the second one you are concatenating all 3 strings (an expensive operation) to end up with one string of 15 bytes, and they you are hashing it to 4 bytes. Both transform 15 bytes to 4, therefore in theory both are quite similar in terms of collisions.

In reality there is a bit of a difference in the probabilities of collisions, but in practice it may not always matter. It depends on the data the strings will have. If all 3 strings are equal and that they each hash to 0001 (I am using a simple number just for the sake of the example). If all 3 are equal then xoring the first two will get you 0000 and xoring the third one with that will get you back to 0001. By concatenating the strings this can be avoided at the cost of some performance (if you are writing a performance critical program, I wouldn't concatenate strings in the inner loop).

So in the end, I haven't really given an answer after all, for the simple reason that there really isn't one. It all depends on where and how it will be used.

like image 23
Cedric Mamo Avatar answered Sep 29 '22 08:09

Cedric Mamo