Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I generate a hashcode from a byte array in C#?

Tags:

c#

hash

Say I have an object that stores a byte array and I want to be able to efficiently generate a hashcode for it. I've used the cryptographic hash functions for this in the past because they are easy to implement, but they are doing a lot more work than they should to be cryptographically oneway, and I don't care about that (I'm just using the hashcode as a key into a hashtable).

Here's what I have today:

struct SomeData : IEquatable<SomeData> {     private readonly byte[] data;     public SomeData(byte[] data)     {         if (null == data || data.Length <= 0)         {             throw new ArgumentException("data");         }         this.data = new byte[data.Length];         Array.Copy(data, this.data, data.Length);     }      public override bool Equals(object obj)     {         return obj is SomeData && Equals((SomeData)obj);     }      public bool Equals(SomeData other)     {         if (other.data.Length != data.Length)         {             return false;         }         for (int i = 0; i < data.Length; ++i)         {             if (data[i] != other.data[i])             {                 return false;             }         }         return true;     }     public override int GetHashCode()     {         return BitConverter.ToInt32(new MD5CryptoServiceProvider().ComputeHash(data), 0);     } } 

Any thoughts?


dp: You are right that I missed a check in Equals, I have updated it. Using the existing hashcode from the byte array will result in reference equality (or at least that same concept translated to hashcodes). for example:

byte[] b1 = new byte[] { 1 }; byte[] b2 = new byte[] { 1 }; int h1 = b1.GetHashCode(); int h2 = b2.GetHashCode(); 

With that code, despite the two byte arrays having the same values within them, they are referring to different parts of memory and will result in (probably) different hash codes. I need the hash codes for two byte arrays with the same contents to be equal.

like image 894
Andrew Avatar asked Aug 19 '08 14:08

Andrew


People also ask

How do I find the hashCode of a key?

GetHash(Object) method is used to get the hashcode of the specified key of a Hashtable object. This method is inherited from the Object Class. Syntax: protected virtual int GetHash(Object Key);

What is hashCode in C?

Advertisements. Hash Table is a data structure which stores data in an associative manner. In hash table, the data is stored in an array format where each data value has its own unique index value. Access of data becomes very fast, if we know the index of the desired data.

Which of the following method is used to find out hashCode of an object?

Java hashCode() Java Object hashCode() is a native method and returns the integer hash code value of the object. The general contract of hashCode() method is: Multiple invocations of hashCode() should return the same integer value, unless the object property is modified that is being used in the equals() method.


1 Answers

The hash code of an object does not need to be unique.

The checking rule is:

  • Are the hash codes equal? Then call the full (slow) Equals method.
  • Are the hash codes not equal? Then the two items are definitely not equal.

All you want is a GetHashCode algorithm that splits up your collection into roughly even groups - it shouldn't form the key as the HashTable or Dictionary<> will need to use the hash to optimise retrieval.

How long do you expect the data to be? How random? If lengths vary greatly (say for files) then just return the length. If lengths are likely to be similar look at a subset of the bytes that varies.

GetHashCode should be a lot quicker than Equals, but doesn't need to be unique.

Two identical things must never have different hash codes. Two different objects should not have the same hash code, but some collisions are to be expected (after all, there are more permutations than possible 32 bit integers).

like image 106
Keith Avatar answered Oct 11 '22 15:10

Keith