You are given a set U of n points on the plane and you can compute distance between any pair of points in constant time. Choose a subset of U called C such that C has exactly k points in it and the distance between the farthest 2 points in C is as small as possible for given k. 1 < k <= n
What's the fastest way to do this besides the obvious n-choose-k solution?
The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.
A KD Tree does the job. It's a geometric data structure that can find nearest neighbors efficiently by cutting the search space similarly to a binary search.
A solution is shown in Finding k points with minimum diameter and related problems - Aggarwal, 1991. The algorithm described therein has running time: O(k^2.5 n log k + n log n)
For those who have no access to the paper: the problem is called k-diameter and definied as
Find a set of k points with minimum diameter. The diameter of a set is the maximum distance between any points of the set.
I cannot really give an overview over the presented algorithm, but it includes computing the (3k - 3)th order Voronoi diagram of the points, and then solve the problem for each of the O(kn) Voronoi sets (by computing maximum independent sets in some bipartite graphs)... I guess that I am trying to say is, that it is highly non-trivial, and far beyond both an interview and this site :-)
Since this is an interview question, here is my shot at a solution. (As dcn points out below, this is not guaranteed to return the optimal solution, though it should still be a decent heuristic. Good catch, dcn!)
There are O(k) points in Sp, and O(n) points outside of it, so finding the point with the smallest max distance is O(nk). We repeat this k times, then repeat the whole procedure n times, for an overall complexity of O(n2k2).
We can improve on this by caching the max distance between any point in Sp and each point outside of Sp. If maxDistanceFromPointInS[pointOutsideS]
is, say, an O(1) hash-table containing the current max distance between every point pointOutsideS
and some point inside Sp, then every time we add a new point newPoint
, we set maxDistanceFromPointInS[p] = Max(maxDistanceFromPointInS[p], distance(newPoint, p))
for all points p
outside of Sp. Then finding the smallest max distance is O(n), adding a point to Sp is O(n). Repeating this k times gives us O((n+n)k) = O(nk). Finally, we repeat the whole procedure n times, for an overall complexity of O(n2k).
We could improve finding the smallest max distance to O(1) using a heap, but that would not change the overall complexity.
By the way, it took an hour to write this all up - there's no way I could have done this in an interview.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With