Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash function for 3d integer coordinates

Having a 3D uniform grid, in order to save memory in large models the empty cells(those that don't overlap with any object) don't need to be saved. I am using Dictionary in c# for this purpose. Although the performance already has decreased yet this is still better than having exception at the time of creating the 3D grid. Now my problem is to find a fast hash function that maps a 3d integer coordinate of the grid to a unique number.

I already have tried ((x * 73856093 + y * 19349669 + z * 83492791))% n which doesn't always generate a unique number.

like image 960
ali Avatar asked Sep 03 '14 16:09

ali


2 Answers

On the one hand you write your aim as “save memory“, while on the other hand you ask for “a fast hash function that maps a 3d integer coordinate of the grid to a unique number”. These two are not very compatible.

Either you want to guarantee O(1) access. In that case you have to prevent hash collisions and must map input to unique numbers. But in that case you also need as many cells in your hash map as there are possible inputs. So you would gain no memory saving over a simple N×N×N array.

Or – and this is far more likely – you only want hash collisions to be rare. Then you can have a hash map which is about twice the number of actually stored objects. But in this case, you don't have to completely avoid hash collisions, you only have to make them sufficiently rare.

Choosing a good hash function depends a lot on the likely patterns of your input data. If input is fairly random, and know the size of your hash map, you should aim for uniform distribution. If objects are more likely located in adjacent blocks, then you want to make sure that small changes in coordinates are unlikely to result in a collision. This is the point where it helps to not make your factors primes, so that a small change in one direction is less likely to collide by one in another direction.

If in doubt, you can always test things: Given three prime numbers (e.g. for the hash 137x+149y+163z) and some real-world setups (i.e. used coordinates and resulting hash map size), you can simply apply the hash to all coordinates, mod down to the hash map size and count the number of unique values. Do the same for various triples and choose the one which maximizes that number. But I doubt that level of optimization is really worth the effort.

like image 194
MvG Avatar answered Sep 19 '22 16:09

MvG


Rather than trying to write a new article on an already well covered topic see the wikipedia article on hash functions. In particular the first image clearly shows how multiple inputs are hashed to the same value.

Basically, your triplet is hashed to some hash value in the range [0,2^64 - 1] (duplicates allowed!). Then the range is reduced to something slightly larger than your number of input values (say n=5) via the equation hash = hash % n. The resulting relation for input values of say [(1,1,1), (1,2,3), (2321, 322, 232), (3,3,3)] will then look something like this:

    (1,1,1)          -> 2
    (1,2,3)          -> 0
    (2321, 322, 232) -> 0
    (3,3,3)          -> 3

As you can see no input value relates (i.e. hashes) to 1 or 4 and there are two input values hashing to 0.

The power of the hash (and the reason the average case is O(1)) is made clear by noting that in order to retrieve an input value from the hash table (e.g. (1,1,1)) the following steps occur.

  1. Input value's hash is calculated and hash = hash % n is applied, therefore (1,1,1) -> 2.
  2. A direct O(1) lookup is performed, i.e. hash_function[2] = (1,1,1) + additional data stored with this particular input value.
  3. Done!

In the case where more than one input value maps to the same hash value (0 in our example), the internal algorithm needs to do a search on those input values which is often done using a red-black tree (worst case O(log n)). The worst case for any lookup is thus also O(log n).

A perfect hash occurs when the relation becomes a one-to-one onto function (a bijection). This gives best performance but is rare. As I stated earlier, luckily it is easy to produce an almost perfect hash where duplicates are scarce. In essence make your hash function as random as possible.

The examples I gave in the comments might be adequate (and the wrong way to do it ): ) but a more standard caculation would be: hash = ((((prime1 + value1) * prime2) + value2) * prime3) + value3) * prime4

which also answers the question. Note that the prime numbers can be any prime but usually small values like 31,37, etc. are used in practice.

In practice testing can be used to check the performance but is usually not necessary.

In any case re-reading your question I am wondering why you are not dropping the entire hash idea and not just store your points in a simple array??

like image 39
Floris Avatar answered Sep 20 '22 16:09

Floris