Multidimensional search by combining geospatial indexes

Question

I build application that stores millions of floating point vectors, each vector having ~100 dimensions. With a query vector, I need to search through these vectors for the k nearest (euclidean) matches. Run time must be faster than scanning all millions of vectors. By "vector" I mean in the linear algebra term a list of ~100 floating point numbers i.e. [0.3, -15.7, 0.004, 457.1, ...]

I know databases like MySQL and MongoDB provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many more dimensions maybe with composite indexes? Or are there other other data stores support indexes on more dimension?

mcdowella · Accepted Answer

If you are looking for exact matches, 100 dimensions is a lot. If you are prepared to settle for approximate matches, there is a class of Locality-Sensitive-Hashing schemes. You could generate a hash or series of hash values for you data sets and use an ordinary database or a 2-d spatial database to look for matches based on the hash value. One reference is http://people.csail.mit.edu/indyk/p117-andoni.pdf.

Multidimensional search by combining geospatial indexes

Tags:

algorithm

database

search

vector

geospatial

muudscope

1 Answers

mcdowella

Recent Activity

Donate For Us

Multidimensional search by combining geospatial indexes

Tags:

algorithm

database

search

vector

geospatial

muudscope

1 Answers

mcdowella

Related questions

Recent Activity

Donate For Us