I need to generate a unique hash code for an object, based on its contents, e.g. DateTime(2011,06,04) should equal DateTime(2011,06,04).
The reason I need to write this? I'm writing a caching layer using PostSharp.
Update
I think I may have been asking the wrong question. As Jon Skeet pointed out, to be on the safe side, I need as many unique combinations in the cache key as there are combinations of potential data in the object. Therefore, the best solution might be to build up a long string that encodes the public properties for the object, using reflection. The objects are not too large so this is very quick and efficient:
If you need to create a unique hash code, then you're basically talking about a number which can represent as many states as your type can have. For DateTime
than means taking the Ticks value and the DateTimeKind
, I believe.
You may be able to get away with assuming that the top two bits of the Ticks
property are going to be zero, and using those to store the kind. That means you're okay up until the year 7307 as far as I can tell:
private static ulong Hash(DateTime when)
{
ulong kind = (ulong) (int) when.Kind;
return (kind << 62) | (ulong) when.Ticks;
}
From a comment:
I'd like something like a GUID based on the objects contents. I don't mind if there's the occasional duplicate every 10 trillion trillion trillion years or so
That seems like an unusual requirement but since that's your requirement, let's do the math.
Let's suppose you make a billion unique objects a year -- thirty per second -- for 10 trillion trillion trillion years. That's 1049 unique objects you're creating. Working out the math is quite easy; the probability of at least one hash collision in that time is above one in 1018 when the bit size of the hash is less than 384.
Therefore you'll need at least a 384 bit hash code to have the level of uniqueness that you require. That's a convenient size, being 12 int32s. If you're going to be making more than 30 objects a second or want the probability to be less than one in 1018 then more bits will be necessary.
Why do you have such stringent requirements?
Here's what I would do if I had your stated requirements. The first problem is to convert every possible datum into a self-describing sequence of bits. If you have a serialization format already, use that. If not, invent one that can serialize all possible objects that you are interested in hashing.
Then, to hash the object, serialize it into a byte array and then run the byte array through the SHA-384 or SHA-512 hashing algorithm. That will produce a professional-crypto-grade 384 or 512 bit hash that is believed to be unique even in the face of attackers trying to force collisions. That many bits should be more than enough to ensure low probability of collision in your ten trillion trillion trillion year timeframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With