Given a database of geographical locations (long/lat), what would be the best approach to determining/detecting clusters of locations that are within x miles of the cluster center AND total at least y locations?
e.g. out of 1000 McWidgets in NC, there are 30 clusters each containing 20 or more stores within 7 miles of their respective cluster center.
It's been a long time since my applied math course in college... any help for an old mushy brain would be greatly appreciated.
Distance based methods optimize a global criteria based on the distance between the patterns. k-means, CLARA, CLARANS are examples of dis- tance based clustering method. Density based methods optimize local criteria based on density information of the patterns.
Divide the total by the number of members of the cluster. In the example above, 283 divided by four is 70.75, and 213 divided by four is 53.25, so the centroid of the cluster is (70.75, 53.25).
The k-means nearest-neighbor (k-means) algorithm belongs to the partitioning clustering methods in which the mean value of the objects or observation in a cluster is used as a center of the cluster, which is also regarded as the center of gravity for a cluster.
The algorithm proceeds by arbitrarily picking up a point in the dataset (until all points have been visited). If there are at least 'minPoint' points within a radius of 'ε' to the point then we consider all these points to be part of the same cluster.
A common method for this kind of problem is Density-based Spatial Clustering of Applications with Noise (DBSCAN). A variation which may be a better choice, if you can't determine a good density parameter, is the Ordering Points To Identify the Clustering Structure (OPTICS) algorithm, which uses a distance parameter, rather than a density parameter.
You probably need one of the Clustering algorithms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With