I am using a hash set in which I store array of integers(32 bits). This means I need an algorithm to hash an array of integers. I am looking for a 32 bits integer(C# int) hash.
I have tried and edited two existing algorithms, which you can see the four versions of at the bottom, including their benchmark.
My questions are as follows:
1. Do you think the bottom algorithm is good for this purpose?
2. Is there a better algorithm available for this purpose?
Program information
16 entries
, and the integers are smaller than 10
, although both must support larger values. I could say the largest values that have any chance on occurring are 200 entries and integers of value 20.Benchmarks and code
below are my benchmarks and code, from worst to best performance in my program.
356525
const uint seed = 144
MurMurHash3 using bytes retrieved from the coordinates directly
Code is equal to https://gist.github.com/automatonic/3725443 The array of bytes is retrieved using the following code:
int size = Marshal.SizeOf(typeof(Coordinates2D));
int length = carCoords.Length;
Byte[] bytes = new Byte[size * length];
for (int i = 0; i < length; ++i)
{
GCHandle pinStructure = GCHandle.Alloc(carCoords[i], GCHandleType.Pinned);
Marshal.Copy(pinStructure.AddrOfPinnedObject(), bytes, i*size, size);
pinStructure.Free();
}
// Hash the byte array
return MurMurHash3.Hash(new System.IO.MemoryStream(bytes));
This is incredibly inefficient, because of the copying.
MurMurHash3 using bytes retrieved from the integers in the objects
public static int Hash2(RushHourPathLengthNode.Coordinates2D[] coords)
{
const uint c1 = 0xcc9e2d51;
const uint c2 = 0x1b873593;
uint h1 = seed;
uint k1 = 0;
uint streamLength = (uint)coords.Length * 2;
for (int i = 0, l = coords.Length; i < l; ++i)
{
// Do it for X
byte[] chunk = BitConverter.GetBytes(coords[i].x);
/* Get four bytes from the input into an uint */
k1 = (uint)
(chunk[0]
| chunk[1] << 8
| chunk[2] << 16
| chunk[3] << 24);
/* bitmagic hash */
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
// Do it for y
chunk = BitConverter.GetBytes(coords[i].y);
/* Get four bytes from the input into an uint */
k1 = (uint)
(chunk[0]
| chunk[1] << 8
| chunk[2] << 16
| chunk[3] << 24);
/* bitmagic hash */
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
}
// finalization, magic chants to wrap it all up
h1 ^= streamLength;
h1 = fmix(h1);
unchecked //ignore overflow
{
return (int)h1;
}
}
This is alot more efficient now the copying is gone.
MurMurHash3 using integers
public static int Hash(RushHourPathLengthNode.Coordinates2D[] coords)
{
const uint c1 = 0xcc9e2d51;
const uint c2 = 0x1b873593;
uint h1 = seed;
uint k1 = 0;
uint streamLength = (uint)coords.Length * 2;
for (int i = 0, l = coords.Length; i < l; ++i)
{
k1 = (uint)coords[i].x;
//bitmagic hash
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
k1 = (uint)coords[i].y;
//bitmagic hash
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
}
// finalization, magic chants to wrap it all up
h1 ^= streamLength;
h1 = fmix(h1);
unchecked //ignore overflow
{
return (int)h1;
}
}
Hash using integer addition multiplication
int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
hash = hash * 31 + carCoords[i].x;
hash = hash * 31 + carCoords[i].y;
}
return hash;
As you see, this one is far more efficient. It works well with any prime numbers. As I understand, there is no scientific proof of this to work, which I am not too fond of.
According to Michal B. a faster version would be using bitshifting. However, testing shows that this is not a successful hash. The problem takes significantly longer to run(It did not finish within 5 minutes). The bitshifting might be good, but it seems like the 31(prime number) is crucial.
int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
hash = hash << 5 - carCoords[i].x;
hash = hash << 5 - carCoords[i].y;
}
return hash;
In the end I went with the last algorithm.
int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
hash = hash * 19 + carCoords[i].x;
hash = hash * 19 + carCoords[i].y;
}
return hash;
This is very fast to compute, and for the (small) numbers I am using the hash is awesome.
If you are going to use this, make sure the numbers you use are prime numbers. Because of this you cannot use bitshifting to optimize it.
Have you considered using a space-filling curve to generate the hash? This will minimize (or eliminate) collisions for the chosen resolution (maxX, maxY)
Here are two SO questions and their answers that use this method.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With