Fast lookup for dictionary vector to a given vector. High dimensions

Question

I'm looking for an answer that scales, but for my specific purpose, I have a 48th dimension vector. This could be represented as an array of 48 integers all between 0 and 255.

I have a large dictionary of these vectors, approximately 25 thousand of them.

I need to be able to take a vector that may or may not be in my database, and quickly find which vector from the database is closest. By closest, I mean in terms of traditional distance formula.

My code will end up in python but this is more a general question.

Brute Force is too slow. I need a near dictionary speed lookup. Anyone have an idea?

Il-Bhima · Accepted Answer

I would suggest implementing a kd-tree on which you can perform Nearest neighbour search. The worst case search time for N points in k dimensions is O(k.N^(1-1/k)) so it should scale sublinearly in N.

If I have time I will come back to this answer and provide a less terse explanation that Wikipedia's.

Since you're working in python this Scipy cookbook entry on kdtrees should help.

Aaron · Answer

Another technique that will prove useful is locality sensitive hashing: http://en.wikipedia.org/wiki/Locality_sensitive_hashing

Its not clear from your question whether you need -exact- nearest neighbors. If you are happy with returning a vector that is approximately the nearest neighbor, there are faster solutions. See here (http://www.cs.umd.edu/~mount/ANN/)

Fast lookup for dictionary vector to a given vector. High dimensions

Tags:

python

algorithm

math

vector

Piper Merriam

2 Answers

Il-Bhima

Aaron

Recent Activity

Donate For Us

Fast lookup for dictionary vector to a given vector. High dimensions

Tags:

python

algorithm

math

vector

Piper Merriam

2 Answers

Il-Bhima

Aaron

Related questions

Recent Activity

Donate For Us