Sklearn kmeans equivalent of elbow method

Tags:

Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows:

from scipy import cluster
cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)]

pyplot.plot([var for (cent,var) in cluster_array])
pyplot.show()

I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was:

from sklearn.cluster import KMeans

km = [KMeans(n_clusters=i) for i range(1,10)]
cluster_array = [km[i].fit(my_matrix)]

That unfortunately resulted in an invalid command error. What is the best way sklearn way to go about this?

Thank you

430

asked Jan 09 '17 03:01

Arash Howaida

3 Answers

you can use the inertia attribute of Kmeans class.

Assuming X is your dataset:

from sklearn.cluster import KMeans
from matplotlib import pyplot as plt

X = # <your_data>
distorsions = []
for k in range(2, 20):
    kmeans = KMeans(n_clusters=k)
    kmeans.fit(X)
    distorsions.append(kmeans.inertia_)

fig = plt.figure(figsize=(15, 5))
plt.plot(range(2, 20), distorsions)
plt.grid(True)
plt.title('Elbow curve')

answered Oct 21 '22 06:10

Ahmed Besbes

You had some syntax problems in the code. They should be fixed now:

Ks = range(1, 10)
km = [KMeans(n_clusters=i) for i in Ks]
score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))]

The fit method just returns a self object. In this line in the original code

cluster_array = [km[i].fit(my_matrix)]

the cluster_array would end up having the same contents as km.

You can use the score method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score).

answered Oct 21 '22 06:10

J. P. Petersen

You can also use euclidean distance between the each data with the cluster center distance to evaluate how many clusters to choose. Here is the code example.

import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

iris = load_iris()
x = iris.data

res = list()
n_cluster = range(2,20)
for n in n_cluster:
    kmeans = KMeans(n_clusters=n)
    kmeans.fit(x)
    res.append(np.average(np.min(cdist(x, kmeans.cluster_centers_, 'euclidean'), axis=1)))

plt.plot(n_cluster, res)
plt.title('elbow curve')
plt.show()

answered Oct 21 '22 07:10

lugq

Related questions
                            
                                What does "**" mean in python? [duplicate]
                            
                                Show non printable characters in a string
                            
                                Static variable in Python?
                            
                                How to remove index from a created Dataframe in Python?
                            
                                ImportError: cannot import name '_ColumnEntity' from 'sqlalchemy.orm.query'
                            
                                Why can't I use ttk in Python?
                            
                                How to repeat try-except block
                            
                                NameError: name 'random' is not defined [closed]
                            
                                Project Euler - How is this haskell code so fast?
                            
                                printing UTF-8 in Python 3 using Sublime Text 3
                            
                                How can I import a Python library located in the current working directory? [duplicate]
                            
                                Python 3.4 :ImportError: no module named win32api
                            
                                Download a zip file and extract it in memory using Python3
                            
                                how to print directly to a text file in both python 2.x and 3.x?
                            
                                How to find the number of nested lists in a list?
                            
                                Converting Roman Numerals to integers in python
                            
                                How to use a variable as function name in Python
                            
                                Error installing mysqlclient for python on Ubuntu 18.04
                            
                                How to resolve the error, "module umap has no attribute UMAP".. I tried installing & reinstalling umap but didn't work to me
                            
                                While using pandas got error urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn kmeans equivalent of elbow method

Tags:

python-3.x

scipy

scikit-learn