Clustering geo location coordinates (lat,long pairs) using KMeans algorithm with Python

Using the following code to cluster geolocation coordinates results in 3 clusters:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.cluster.vq import kmeans2, whiten

    coordinates= np.array([
               [lat, long],
               [lat, long],
                ...
               [lat, long]
               ])
    x, y = kmeans2(whiten(coordinates), 3, iter = 20)  
    plt.scatter(coordinates[:,0], coordinates[:,1], c=y);
    plt.show()

Is it right to use Kmeans for location clustering, as it uses Euclidean distance and not Haversine formula as a distance function?

What does K mean in coordinates?

The k-means algorithm groups N observations (i.e., rows in an array of coordinates) into k clusters. However, k-means is not an ideal algorithm for latitude-longitude spatial data because it minimizes variance, not geodetic distance.

k-means is not a good algorithm to use for spatial clustering, for the reasons you meantioned. Instead, you could do this clustering job using scikit-learn's DBSCAN with the haversine metric and ball-tree algorithm.

This tutorial demonstrates clustering latitude-longitude spatial data with DBSCAN/haversine and avoids all those Euclidean-distance problems:

df = pd.read_csv('gps.csv')
coords = df.as_matrix(columns=['lat', 'lon'])
db = DBSCAN(eps=eps, min_samples=ms, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))

Note that this specifically uses scikit-learn v0.15, as some earlier/later versions seem to require a full distance matrix to be computed. Also notice that the eps value is in radians and that .fit() takes the coordinates in radian units for the haversine metric.

Clustering geo location coordinates (lat,long pairs) using KMeans algorithm with Python

Tags:

python

geolocation

numpy

scipy

k-means

rok

People also ask

1 Answers

eos

Recent Activity

Donate For Us

Clustering geo location coordinates (lat,long pairs) using KMeans algorithm with Python

Tags:

python

geolocation

numpy

scipy

k-means

rok

People also ask

1 Answers

eos

Related questions

Recent Activity

Donate For Us