Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it?

Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework.

like image 899
louzer Avatar asked Oct 10 '10 05:10

louzer


People also ask

How do you find the closest point to a set of points?

The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.

What is the complexity of the 2d closest pair problem?

The time complexity of this algorithm will be O(n log n).

What is the average case complexity of closest pair problem?

d) O(N3 log N) Answer: c Explanation: The efficiency of closest pair algorithm by brute force technique is mathematically found to be O(N2). 4. What is the basic operation of closest pair algorithm using brute force technique?


3 Answers

There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books. In fact, I suggest it for everyone as the way they apply divide-and-conquer technique to this problem is very simple, elegant and impressive.

Although it can't be extended directly to your problem (as constant 7 would be replaced with 2^101 - 1), it should be just fine for most datasets. So, if you have reasonably random input, it will give you O(n*logn*m) complexity where n is the number of points and m is the number of dimensions.

edit
That's all assuming you have Euclidian space. I.e., length of vector v is sqrt(v0^2 + v1^2 + v2^2 + ...). If you can choose metric, however, there could be other options to optimize the algorithm.

like image 129
Nikita Rybak Avatar answered Sep 23 '22 08:09

Nikita Rybak


Use a kd tree. You're looking at a nearest neighbor problem and there are highly optimized data structures for handling this exact class of problems.

http://en.wikipedia.org/wiki/Kd-tree

P.S. Fun problem!

like image 27
Stefan Mai Avatar answered Sep 24 '22 08:09

Stefan Mai


You could try the ANN library, but that only gives reliable results up to 20 dimensions.

like image 38
dalle Avatar answered Sep 24 '22 08:09

dalle