I have a large amount of 2D sets of coordinates on a 6000x6000 plane (2116 sets), available here: http://pastebin.com/kiMQi7yu (the context isn't really important so I just pasted the raw data).
I need to write an algorithm to group together coordinates that are close to each other by some threshold. The coordinates in my list are already in groups on that plane, but the order is very scattered.
Despite this task being rather brain-melting to me at first, I didn't admit defeat instantly; this is what I tried:
First sort the list by the Y value, then sort it by the X value. Run through the list checking the distance between the current set and the previous. If they are close enough (100 units) then add them to the same group.
This method didn't really work out (as I expected). There are still objects that are pretty close that are in different groups, because I'm only comparing the next set in the list and the list is sorted by the X position.
I'm out of ideas! The language I'm using is C but I suppose that's not really relevant since all I need is an idea for how the algorithm should work. Thanks!
Though I haven't looked at the data set, it seems that you already know how many groups there are. Have you considered using k means? http://en.m.wikipedia.org/wiki/K-means_clustering
I'm just thinking this along while I write.
Added later: Regarding 3., the set of points to be investigated for one neighbour can be coarsely but quickly (!) determined by looking at the rectangle enclosing the segment. [End of addition]
This is just an idea - I can't claim that I've ever done anything remotely like this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With