Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good way to hash a float vector?

I am well aware of all the problems involved in comparing floats. This is exactly the reason for this question.
I'm looking to create a fast hash table for values that are 3D vectors (3 floats - x,y,z). It can be assumed that the length of the vector is always 1.0 (sqrt(x*x+y*y+z*z) is 1.0)

Essentially this means that I'm looking for a hash function that takes values that are almost equal to the same unsigned int value and a corresponding equality operator that is true if the hash values are equal (not not necessarily only if they are equal)

Edit -
False positives (i.e. vectors that are different but map to the same bucket) are a given since this is a hash table.
False negatives (i.e. Vectors that are close but map to different buckets) are undesirable but it seems that there is no way to avoid them. In my case, they will not causes total breakage, just some data duplication which is something I'll have to live with.

like image 760
shoosh Avatar asked Mar 16 '09 12:03

shoosh


1 Answers

I think what you're looking for is not directly possible. One important property of equality is that it is transitive. (ie. If a == b and b == c, then a == c). With a distance measure, though, you really don't want this property. Example:

Take a single float (for simplicity). Suppose we want to hash each float so that floats less than 1e-3 away are the same value. Now, suppose we add to this hash table 1000 float values all 1e-4 apart. Any neighboring 2 values should hash to the same float, since they are closer than 1e-3. However, because of transitivity, the neighbors of those values should also has to the same value, and their neighbors and so on. As a result, all 1000 values, including pairs farther than 1e-3 apart, would hash to the same integer. If you were to draw these points on a picture:

A  B  C  D  E  F  G  H ... Y Z 

Suppose all the gaps are < 1e-3 apart, but A and Z are > 1e-3 apart (not to scale!). This can't be satisfied because if hash(A) == hash(B) and hash(B) == hash(C) and so on for all pairs, (since they are < 1e-3 apart) than hash(A) must == hash(Z).

One possible option is to define regions of your vector space in which all vectors would hash to the same value (ie round them before hashing them), but you could still get 2 vectors on the edges of their respective spaces that are close together but hash to a different value. You could get around that by searching all neighboring spaces for a vector. (ie in the 1-d case above, you'd round all vectors to the nearest multiple of 1e-3, and then search the neighbors, so 5.3e-3 would search 5e-3, 4e-3 and 6-e3. In higher-dimensional cases, you'd have to search neighbors in all dimensions.)

like image 154
Jesse Rusak Avatar answered Nov 01 '22 17:11

Jesse Rusak