Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort array of locations by nearest point

Tags:

algorithm

So the question is like this:

Given a location X and an array of locations, I want to get an array of locations that are closest to location X, in other words sorted by closest distance.

The way I solved this is by iterating through each location array and calculate the distance between X and that particular location, store that distance, and sort the location by distance using a Comparator. Done! Is there a better way to do this? Assuming that the sort is a merge sort, it should be O(n log n).

like image 814
xonegirlz Avatar asked Oct 30 '11 19:10

xonegirlz


2 Answers

If I understand this right, you can do this pretty quickly for multiple queries - as in, given several values of X, you wouldn't have to re-sort your solution array every time. Here's how you do it:

  1. Sort the array initially (O(n logn) - call this pre-processing)
  2. Now, on every query X, binary search the array for X (or closest number smaller than X). Maintain, two indices, i and j, one which points to the current location, one to the next. One among these is clearly the closest number to X on the list. Pick the smaller distance one and put it in your solution array. Now, if i was picked, decrement i - if j was picked, increment j. Repeat this process till all the numbers are in the solution array.

This takes O(n + logn) for each query with O(nlogn) preprocessing. Of course, if we were talking about just one X, this is not better at all.

like image 65
kyun Avatar answered Sep 27 '22 00:09

kyun


The problem you describe sounds like a m-nearest neighbor search to me.

So, if I correctly understood your question, i.e. the notion of a location being a vector in a multidimensional metric space, and the distance being a proper metric in this space, then it would be nice to put the array of locations in a k-d-Tree. You have some overhead for the tree building once, but you get the search for O(log n) then.

A benefit of this, assuming you are just interested in the m < n closest available locations, you don't need to evaluate all n distances every time you search for a new X.

like image 34
moooeeeep Avatar answered Sep 26 '22 00:09

moooeeeep