I've been using scipy.cluster.vq.kmeans
for doing some k-means clustering, but was wondering if there's a way to determine which centroid each of your data points is (putativly) associated with.
Clearly you could do this manually, but as far as I can tell the kmeans function doesn't return this?
There is a function kmeans2
in scipy.cluster.vq
that returns the labels, too.
In [8]: X = scipy.randn(100, 2)
In [9]: centroids, labels = kmeans2(X, 3)
In [10]: labels
Out[10]:
array([2, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 2, 1, 2, 1, 2, 0,
1, 0, 2, 0, 1, 2, 0, 1, 0, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 0,
2, 2, 0, 1, 0, 0, 0, 2, 2, 2, 0, 0, 1, 2, 1, 0, 0, 0, 2, 1, 1, 1, 1,
1, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 2, 0, 2, 2, 0,
1, 1, 0, 1, 0, 0, 0, 2])
Otherwise, if you must use kmeans
, you can also use vq
to get labels:
In [17]: from scipy.cluster.vq import kmeans, vq
In [18]: codebook, distortion = kmeans(X, 3)
In [21]: code, dist = vq(X, codebook)
In [22]: code
Out[22]:
array([1, 0, 1, 0, 2, 2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
2, 2, 1, 2, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1,
0, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 0, 0, 2, 2, 2, 2, 1, 0, 2, 2, 2,
0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 2, 0, 2, 0, 2, 2, 1, 1, 1, 1, 1,
2, 0, 2, 0, 2, 1, 1, 1])
Documentation: scipy.cluster.vq
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With