Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the optimal way to compute a hashcode for a set of points?

I'm looking for the optimal way to compute a hashcode for a set of bi-dimensional points (so that I can store polygons in a hashtable).

There are some obvious ways to do that, such as concatenating all the points coordinates in a string and its hashcode, but this would be very slow.

On the other end of the speed/collision spectrum, I can also for example sum up all the coordinates, which would result in a very fast code, but would also create a lot of collisions.

What's the optimal way to compute a hashcode for a set of points?

Is the optimal solution different if the coordinates are integer (vs real coordinates)?

Edit : I'm using .net so the hashcode should be 32 bits long.

like image 904
Brann Avatar asked Aug 16 '09 13:08

Brann


People also ask

What is the best strategy to calculate hashCode?

The easiest way to compute a field's hash code is to just call `hashCode` on it. Combining them could be done manually.

Why is 31 used in hashCode?

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

What is the valid use of hashCode method?

hashCode in Java helps the program to run faster. For example, comparing two objects by their hashcodes will give the result 20 times faster than comparing them using the equals() function. This is so because hash data structures like HashMaps, internally organize the elements in an array-based data structure.


2 Answers

There is no optimal way for this job. It all depends on how big hash can you afford. You have to make tradoffs between speed and diffusion. Keep in mind that there is no such thing as optimal solution (if you do not exactly know what you are going to hash) In some cases xor can be good enough.

Take for instance this code

unsigned int JSHash(char* str, unsigned int len)
{
    unsigned int hash = 1315423911;
    unsigned int i    = 0;

    for(i = 0; i < len; str++, i++)
    {
        hash ^= ((hash << 5) + (*str) + (hash >> 2));
    }

    return hash;
}
/* End Of JS Hash Function */

You said that agregating points together is to slow. If you fix upper code it does not need any kind of agregation just pass trought (not much different that sums) And if you are using integeres and floats you would probably fix shifts (<< and >> are shift operations which together works like bitwise rotation) to fit your data type.

Check for other hash functions here: http://www.partow.net/programming/hashfunctions/

like image 184
Luka Rahne Avatar answered Sep 30 '22 13:09

Luka Rahne


Optimal is dependent on your requirements from the hash computation.

Performance will come at the cost of more hash collisions.

Do you have a hard bound on either one? It's going to come down to a mathematical analysis of how much each percent of hash collisions is going to cost you in terms of performance.

like image 20
Yuval Adam Avatar answered Sep 30 '22 12:09

Yuval Adam