Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

2d point clustering

Given: Given a set of N points in the 2D plane (x and y coordinates), and a set of N radii corresponding to each point. We will refer to a point's disc as the disc centered at the point with its radius.

Problem: Cluster the points. A cluster of points is such that each point EITHER falls within the disc of at least one other point in the cluster OR at least one other point in the cluster falls within its disc. Clusters of single points do not count as clusters.

I need an algorithm to compute this efficiently (preferably without resorting to complicated spatial hashing techniques like kd-trees). I can settle for O(N^2) run time but definitely no more than O(N^3). I feel like this should be simple but I'm getting caught up on the non-reciprocal nature of my clustering condition.

In essence, I am looking to fill in the C function prototype:

int cluster_points(
    size_t n,
    point_t *pt, // length n list of points
    double *radius, // length n list of radii for each point
    int *cluster, // length n list of cluster indexes; -1 if not clustered
    size_t *ncluster // number of clusters (number of distinct non -1 entries in cluster)
);

This is not homework. I need this as part of a matrix algorithm for clustering complex numbers.

like image 785
Victor Liu Avatar asked Oct 14 '10 21:10

Victor Liu


1 Answers

The brute force solution is only O(N2), so it should work for you.

1) Start with all of the points in the unassigned group.

2) Pick one point and look at all the others in the unassigned group and see whether the meet the radii criterion you describe.

  • 2a) If yes, start a group, and continue on with all the other points to see if they fit in this group (based on your current point), and if they fit in, move them from the unassigned group into this new one.
  • 2ab) When done with the current point, go to each point added to the group and check all the points in the unassigned group.
  • 2b) But, if no point matches the radii criteria for the current point, discard it.

3) At the end you will have grouped the points by your criteria, and will have done no more than N*(N/2) inspections.

Btw, what you describe is not what's normally meant by "clustering", so I think that's throwing people off here. What makes clustering a difficult problem is that the question of whether two neighboring points will be assigned to the same cluster is determined by all the other points in the data set. In your case, it's (basically) only determined by properties of the two points, so that in your case, you can just check them all.

like image 137
tom10 Avatar answered Nov 16 '22 03:11

tom10