Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the farthest point in one set from another set

My goal is a more efficient implementation of the algorithm posed in this question.

Consider two sets of points (in N-space. 3-space for the example case of RGB colorspace, while a solution for 1-space 2-space differs only in the distance calculation). How do you find the point in the first set that is the farthest from its nearest neighbor in the second set?

In a 1-space example, given the sets A:{2,4,6,8} and B:{1,3,5}, the answer would be 8, as 8 is 3 units away from 5 (its nearest neighbor in B) while all other members of A are just 1 unit away from their nearest neighbor in B. edit: 1-space is overly simplified, as sorting is related to distance in a way that it is not in higher dimensions.

The solution in the source question involves a brute force comparison of every point in one set (all R,G,B where 512>=R+G+B>=256 and R%4=0 and G%4=0 and B%4=0) to every point in the other set (colorTable). Ignore, for the sake of this question, that the first set is elaborated programmatically instead of iterated over as a stored list like the second set.

like image 997
Sparr Avatar asked Feb 26 '09 20:02

Sparr


1 Answers

First you need to find every element's nearest neighbor in the other set.

To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.

Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)

From this you should be able to get a list of nearest neighbors for each element.

While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.

Then, once you're done simply ask the heap for the max.

The run time for this breaks down as follows:

N = size of smaller set
M = size of the larger set

  • N * O(log M + 1) for all the kd-tree nearest neighbor checks.
  • N * O(1) for calculating the Euclidean distance before adding it to the heap.
  • N * O(log N) for adding the pairs into the heap.
  • O(1) to get the final answer :D

So in the end the whole algorithm is O(N*log M).

If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.

*Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.

like image 151
Ben S Avatar answered Oct 21 '22 23:10

Ben S