Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multidimensional search by combining geospatial indexes

I build application that stores millions of floating point vectors, each vector having ~100 dimensions. With a query vector, I need to search through these vectors for the k nearest (euclidean) matches. Run time must be faster than scanning all millions of vectors. By "vector" I mean in the linear algebra term a list of ~100 floating point numbers i.e. [0.3, -15.7, 0.004, 457.1, ...]

I know databases like MySQL and MongoDB provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many more dimensions maybe with composite indexes? Or are there other other data stores support indexes on more dimension?

like image 983
muudscope Avatar asked May 10 '13 19:05

muudscope


1 Answers

If you are looking for exact matches, 100 dimensions is a lot. If you are prepared to settle for approximate matches, there is a class of Locality-Sensitive-Hashing schemes. You could generate a hash or series of hash values for you data sets and use an ordinary database or a 2-d spatial database to look for matches based on the hash value. One reference is http://people.csail.mit.edu/indyk/p117-andoni.pdf.

like image 185
mcdowella Avatar answered Sep 28 '22 14:09

mcdowella