Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suitable hash code methods for an array of bytes?

What is the best hash method for an array of byte?

The arrays are serialized class objects containing jpeg image passed between applications over TCP/IP.

The array size is about 200k.

like image 519
Chesnokov Yuriy Avatar asked Aug 30 '11 13:08

Chesnokov Yuriy


3 Answers

Any of the built-in hashing functions should do; depending on how much you care about collisions these are your options (from most collisions to least):

  • MD5
  • SHA1
  • SHA256
  • SHA384
  • SHA512

They are as simple to use as:

var hash = SHA1.Create().ComputeHash(data);

Bonus Marks: If you don't care about security (which I don't think you do given that you are getting the hashes for images) you might want to look into Murmur hash, which is designed for content hashing and not secure hashing (and is thus much faster). It isn't, however, in the framework so you will have to find an implementation (and you should probably go for Murmur3).

Edit: If you are looking for a HASHCODE for a byte[] array it's entirely up to you, it usually consists of bit shifting (by primes) and XORing. E.g.

public class ByteArrayEqualityComparer : IEqualityComparer<byte[]>
{
    public static readonly ByteArrayEqualityComparer Default = new ByteArrayEqualityComparer();
    private ByteArrayEqualityComparer() { }

    public bool Equals(byte[] x, byte[] y)
    {
        if (x == null && y == null)
            return true;
        if (x == null || y == null)
            return false;
        if (x.Length != y.Length)
            return false;
        for (var i = 0; i < x.Length; i++)
            if (x[i] != y[i])
                return false;
        return true;
    }

    public int GetHashCode(byte[] obj)
    {
        if (obj == null || obj.Length == 0)
            return 0;
        var hashCode = 0;
        for (var i = 0; i < obj.Length; i++)
            // Rotate by 3 bits and XOR the new value.
            hashCode = (hashCode << 3) | (hashCode >> (29)) ^ obj[i];
        return hashCode;
    }
}
// ...
var hc = ByteArrayEqualityComparer.Default.GetHashCode(data);

EDIT: If you want to validate that the value hasn't changed you should use CRC32.

like image 199
Jonathan Dickinson Avatar answered Nov 08 '22 09:11

Jonathan Dickinson


Jon Skeet has a good answer on how to override GetHashCode, which is based on general effective hash techniques where you start with a prime number, add it to the hash codes of the components multiplied by another prime number, allowing for overflow.

For your case, you would do:

static int GetByteArrayHashCode(byte[] array)
{
    unchecked
    {
        int hash = 17;

        // Cycle through each element in the array.
        foreach (var value in array)
        {
            // Update the hash.
            hash = hash * 23 + value.GetHashCode();            
        }

        return hash;
    }
}

Note in Jon's answer he goes into why this is better than XORing the hashes of the individual elements (and that anonymous types in C# currently do not XOR the hashes of the individual elements, but use something similar to the above).

While this will be faster than the hash algorithms from the System.Security.Cryptography namespace (because you are dealing with smaller hashes), the downside is that you might have more collisions.

You would have to test against your data and determine how often you are getting collisions vs. the work that needs to be done in the case of a collision.

like image 39
casperOne Avatar answered Nov 08 '22 10:11

casperOne


Based on the Compiler Generated GetHashCode()

public static int GetHashCode(byte[] array) {
    unchecked {
        int i = 0;
        int hash = 17;
        int rounded = array.Length & ~3;

        hash = 31 * hash + array.Length;

        for (; i < rounded; i += 4) {
            hash = 31 * hash + BitConverter.ToInt32(array, i);
        }

        if (i < array.Length) {
            int val = array[i];
            i++;

            if (i < array.Length) {
                val |= array[i] << 8;
                i++;

                if (i < array.Length) {
                    val |= array[i] << 16;
                }
            }

            hash = 31 * hash + val;
        }

        return hash;
    }
}

Ah... and a link to C# Murmurhash http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html

like image 4
xanatos Avatar answered Nov 08 '22 09:11

xanatos