I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it?
Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework.
The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.
The time complexity of this algorithm will be O(n log n).
d) O(N3 log N) Answer: c Explanation: The efficiency of closest pair algorithm by brute force technique is mathematically found to be O(N2). 4. What is the basic operation of closest pair algorithm using brute force technique?
There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books. In fact, I suggest it for everyone as the way they apply divide-and-conquer technique to this problem is very simple, elegant and impressive.
Although it can't be extended directly to your problem (as constant 7
would be replaced with 2^101 - 1
), it should be just fine for most datasets. So, if you have reasonably random input, it will give you O(n*logn*m)
complexity where n
is the number of points and m
is the number of dimensions.
edit
That's all assuming you have Euclidian space. I.e., length of vector v
is sqrt(v0^2 + v1^2 + v2^2 + ...)
. If you can choose metric, however, there could be other options to optimize the algorithm.
Use a kd tree. You're looking at a nearest neighbor problem and there are highly optimized data structures for handling this exact class of problems.
http://en.wikipedia.org/wiki/Kd-tree
P.S. Fun problem!
You could try the ANN library, but that only gives reliable results up to 20 dimensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With