Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I organise a clustered set of 2D coordinates into groups of close together sets?

I have a large amount of 2D sets of coordinates on a 6000x6000 plane (2116 sets), available here: http://pastebin.com/kiMQi7yu (the context isn't really important so I just pasted the raw data).

I need to write an algorithm to group together coordinates that are close to each other by some threshold. The coordinates in my list are already in groups on that plane, but the order is very scattered.

Despite this task being rather brain-melting to me at first, I didn't admit defeat instantly; this is what I tried:

First sort the list by the Y value, then sort it by the X value. Run through the list checking the distance between the current set and the previous. If they are close enough (100 units) then add them to the same group.

This method didn't really work out (as I expected). There are still objects that are pretty close that are in different groups, because I'm only comparing the next set in the list and the list is sorted by the X position.

I'm out of ideas! The language I'm using is C but I suppose that's not really relevant since all I need is an idea for how the algorithm should work. Thanks!

like image 880
Southclaws Avatar asked Jul 07 '14 15:07

Southclaws


2 Answers

Though I haven't looked at the data set, it seems that you already know how many groups there are. Have you considered using k means? http://en.m.wikipedia.org/wiki/K-means_clustering

like image 86
Blake Avatar answered Nov 03 '22 00:11

Blake


I'm just thinking this along while I write.

  1. Tile the "arena" with squares that have the diameter of your distance (200) as their diagonal.
  2. If there are any points within a square (x,y), they are tentatively part of Cluster(x,y).
  3. Within each square (x,y), there are (up to) 4 areas where the circles of Cluster(x-1,y), Cluster(x+1,y), Cluster(x, y-1) and Cluster(x,y+1) overlap "into" the square; of these consider only those Clusters that are tentatively non-empty.
  4. If all points of Cluster(x,y) are in the (up to 4) overlapping segments of non-empty neighbouring clusters: reallocate these points to the pertaining Cluster and remove Cluster(x,y) from the set of non-empty Clusters.

Added later: Regarding 3., the set of points to be investigated for one neighbour can be coarsely but quickly (!) determined by looking at the rectangle enclosing the segment. [End of addition]

This is just an idea - I can't claim that I've ever done anything remotely like this.

like image 34
laune Avatar answered Nov 02 '22 23:11

laune