I'm looking for the optimal way to compute a hashcode for a set of bi-dimensional points (so that I can store polygons in a hashtable).
There are some obvious ways to do that, such as concatenating all the points coordinates in a string and its hashcode, but this would be very slow.
On the other end of the speed/collision spectrum, I can also for example sum up all the coordinates, which would result in a very fast code, but would also create a lot of collisions.
What's the optimal way to compute a hashcode for a set of points?
Is the optimal solution different if the coordinates are integer (vs real coordinates)?
Edit : I'm using .net so the hashcode should be 32 bits long.
The easiest way to compute a field's hash code is to just call `hashCode` on it. Combining them could be done manually.
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.
hashCode in Java helps the program to run faster. For example, comparing two objects by their hashcodes will give the result 20 times faster than comparing them using the equals() function. This is so because hash data structures like HashMaps, internally organize the elements in an array-based data structure.
There is no optimal way for this job. It all depends on how big hash can you afford. You have to make tradoffs between speed and diffusion. Keep in mind that there is no such thing as optimal solution (if you do not exactly know what you are going to hash) In some cases xor can be good enough.
Take for instance this code
unsigned int JSHash(char* str, unsigned int len)
{
unsigned int hash = 1315423911;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash ^= ((hash << 5) + (*str) + (hash >> 2));
}
return hash;
}
/* End Of JS Hash Function */
You said that agregating points together is to slow. If you fix upper code it does not need any kind of agregation just pass trought (not much different that sums) And if you are using integeres and floats you would probably fix shifts (<< and >> are shift operations which together works like bitwise rotation) to fit your data type.
Check for other hash functions here: http://www.partow.net/programming/hashfunctions/
Optimal is dependent on your requirements from the hash computation.
Performance will come at the cost of more hash collisions.
Do you have a hard bound on either one? It's going to come down to a mathematical analysis of how much each percent of hash collisions is going to cost you in terms of performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With