k-means in python: Determine which data are associated with each centroid

Question

I've been using scipy.cluster.vq.kmeans for doing some k-means clustering, but was wondering if there's a way to determine which centroid each of your data points is (putativly) associated with.

Clearly you could do this manually, but as far as I can tell the kmeans function doesn't return this?

Steve Tjoa · Accepted Answer

There is a function kmeans2 in scipy.cluster.vq that returns the labels, too.

In [8]: X = scipy.randn(100, 2)

In [9]: centroids, labels = kmeans2(X, 3)

In [10]: labels
Out[10]: 
array([2, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 2, 1, 2, 1, 2, 0,
       1, 0, 2, 0, 1, 2, 0, 1, 0, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 0,
       2, 2, 0, 1, 0, 0, 0, 2, 2, 2, 0, 0, 1, 2, 1, 0, 0, 0, 2, 1, 1, 1, 1,
       1, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 2, 0, 2, 2, 0,
       1, 1, 0, 1, 0, 0, 0, 2])

Otherwise, if you must use kmeans, you can also use vq to get labels:

In [17]: from scipy.cluster.vq import kmeans, vq

In [18]: codebook, distortion = kmeans(X, 3)

In [21]: code, dist = vq(X, codebook)

In [22]: code
Out[22]: 
array([1, 0, 1, 0, 2, 2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
       2, 2, 1, 2, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1,
       0, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 0, 0, 2, 2, 2, 2, 1, 0, 2, 2, 2,
       0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 2, 0, 2, 0, 2, 2, 1, 1, 1, 1, 1,
       2, 0, 2, 0, 2, 1, 1, 1])

Documentation: scipy.cluster.vq

k-means in python: Determine which data are associated with each centroid

Tags:

python

scipy

cluster-analysis

k-means

Alex

1 Answers

Steve Tjoa

Recent Activity

Donate For Us

k-means in python: Determine which data are associated with each centroid

Tags:

python

scipy

cluster-analysis

k-means

Alex

1 Answers

Steve Tjoa

Related questions

Recent Activity

Donate For Us