The problem in general: I have a big 2d point space, sparsely populated with dots. Think of it as a big white canvas sprinkled with black dots. I have to iterate over and search through these dots a lot. The Canvas (point space) can be huge, bordering on the limits of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal: I need a hash function taking a 2D point, returning a unique uint32. So that no collisions can occur. You can assume that the number of dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand (it may even change), so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise: A hash function that provides keys with a very low probability of collision.
Any ideas anybody? Thanks for any help.
Best regards, Andreas T.
Edit: I had to change something in the question text: I changed the assumption "able to count the number of points of the canvas with uint32" into "able to count the dots on the canvas (or the number of coordinate pairs to store" by uint32. My original question didn't make much sense, because I would have had a sqrt(max(uint32))xsqrt(max(uint32)) sized canvas, which is uniquely representable by a 16 bit shift and OR.
I hope this is ok, since all answers still make most sense with the updated assumptions
Sorry for that.
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.
Hash FunctionsA hash function maps each key to an integer in the range [0, N -1], where N is the capacity of the bucket array for the hash table.
A hash function is a versatile one-way cryptographic algorithm that maps an input of any size to a unique output of a fixed length of bits. The resulting output, which is known as a hash digest, hash value, or hash code, is the resulting unique identifier we mentioned earlier.
A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With