I have a set containing thousands of addresses. If I can get the longitude and latitude of each address, how do I split the set into groups by proximity?
Further, I may want to retry the 'clustering' according to different rules:
You could try the k-means clustering algorithm.
You want vector quantization:
http://en.wikipedia.org/wiki/Vector_quantization
"It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms."
Here the vectors are the geographic coordinates of each address, and you can feed your algorithms with other parameters depending on your constraints (proximity, group size, number of groups...).
You can start with k-means, but from my experience a Voronoi based algorithm is more flexible. A good introduction here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With