Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate hash of object consistently

Tags:

I'm trying to get a hash (md5 or sha) of an object.

I've implemented this: http://alexmg.com/post/2009/04/16/Compute-any-hash-for-any-object-in-C.aspx

I'm using nHibernate to retrieve my POCOs from a database.
When running GetHash on this, it's different each time it's selected and hydrated from the database. I guess this is expected, as the underlying proxies will change.

Anyway,

Is there a way to get a hash of all the properties on an object, consistently each time?

I've toyed with the idea of using a StringBuilder over this.GetType().GetProperties..... and creating a hash on that, but that seems inefficient?

As a side note, this is for change-tracking these entities from one database (RDBMS) to a NoSQL store (comparing hash values to see if objects changed between rdbms and nosql)

like image 881
Alex Avatar asked Sep 12 '12 17:09

Alex


1 Answers

If you're not overriding GetHashCode you just inherit Object.GetHashCode. Object.GetHashCode basically just returns the memory address of the instance, if it's a reference object. Of course, each time an object is loaded it will likely be loaded into a different part of memory and thus result in a different hash code.

It's debatable whether that's the correct thing to do; but that's what was implemented "back in the day" so it can't change now.

If you want something consistent then you have to override GetHashCode and create a code based on the "value" of the object (i.e. the properties and/or fields). This can be as simple as a distributed merging of the hash codes of all the properties/fields. Or, it could be as complicated as you need it to be. If all you're looking for is something to differentiate two different objects, then using a unique key on the object might work for you.If you're looking for change tracking, using the unique key for the hash probably isn't going to work

I simply use all the hash codes of the fields to create a reasonably distributed hash code for the parent object. For example:

public override int GetHashCode()
{
    unchecked
    {
        int result = (Name != null ? Name.GetHashCode() : 0);
        result = (result*397) ^ (Street != null ? Street.GetHashCode() : 0);
        result = (result*397) ^ Age;
        return result;
    }
}

The use of the prime number 397 is to generate a unique number for a value to better distribute the hash code. See http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/ for more details on the use of primes in hash code calculations.

You could, of course, use reflection to get at all the properties to do this, but that would be slower. Alternatively you could use the CodeDOM to generate code dynamically to generate the hash based on reflecting on the properties and cache that code (i.e. generate it once and reload it next time). But, this of course, is very complex and might not be worth the effort.

An MD5 or SHA hash or CRC is generally based on a block of data. If you want that, then using the hash code of each property doesn't make sense. Possibly serializing the data to memory and calculating the hash that way would be more applicable, as Henk describes.

like image 86
Peter Ritchie Avatar answered Sep 19 '22 12:09

Peter Ritchie