How to use silhouette score in k-means clustering from sklearn library?

Tags:

I'd like to use silhouette score in my script, to automatically compute number of clusters in k-means clustering from sklearn.

import numpy as np
import pandas as pd
import csv
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

filename = "CSV_BIG.csv"

# Read the CSV file with the Pandas lib.
path_dir = ".\\"
dataframe = pd.read_csv(path_dir + filename, encoding = "utf-8", sep = ';' ) # "ISO-8859-1")
df = dataframe.copy(deep=True)

#Use silhouette score
range_n_clusters = list (range(2,10))
print ("Number of clusters from 2 to 9: \n", range_n_clusters)

for n_clusters in range_n_clusters:
    clusterer = KMeans (n_clusters=n_clusters).fit(?)
    preds = clusterer.predict(?)
    centers = clusterer.cluster_centers_

    score = silhouette_score (?, preds, metric='euclidean')
    print ("For n_clusters = {}, silhouette score is {})".format(n_clusters, score)

Someone can help me with question marks? I don't understand what to put instead of question marks. I have taken the code from an example. The commented part is the previous versione, where I do k-means clustering with a fixed number of clusters set to 4. The code in this way is correct, but in my project I need to automatically chose the number of clusters.

696

asked Jul 02 '18 14:07

Jessica Martini

1 Answers

I am assuming you are going to silhouette score to get the optimal no. of clusters.

First declare a seperate object of KMeans and then call it's fit_predict functions over your data df like this

for n_clusters in range_n_clusters:
    clusterer = KMeans(n_clusters=n_clusters)
    preds = clusterer.fit_predict(df)
    centers = clusterer.cluster_centers_

    score = silhouette_score(df, preds)
    print("For n_clusters = {}, silhouette score is {})".format(n_clusters, score))

See this official example for more clarity.

166

answered Sep 19 '22 11:09

Gambit1614

Related questions
                            
                                How to format long SQL queries according to PEP8
                            
                                What is the type of a JSON object in python?
                            
                                How to refresh sys.path?
                            
                                Error Using django-tables2 - Expected table or queryset, not 'str'
                            
                                Extracting comments from Python Source Code
                            
                                How to get all the keys with the same highest value?
                            
                                How can I zip file with a flattened directory structure using Zipfile in Python?
                            
                                How to create real time graph in kivy?
                            
                                How to get Windows short file name in python?
                            
                                Use plotly offline to generate graphs as images
                            
                                Finding next occurring tag and its enclosed text with Beautiful Soup
                            
                                Subtract every column in dataframe with the mean of that column with Python
                            
                                Using pip on Windows installed with both python 2.7 and 3.5
                            
                                Using Counter with list of lists
                            
                                How can i count occurrence of each word in document using Dictionary comprehension
                            
                                Python Scrapy: What is the difference between "runspider" and "crawl" commands?
                            
                                TypeError: unhashable type: 'list' when use groupby in python
                            
                                django-rest-framework - autogenerate form in browsable API?
                            
                                eval() and run() in tensorflow
                            
                                cryptography AssertionError: sorry, but this version only supports 100 named groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use silhouette score in k-means clustering from sklearn library?

Tags:

machine-learning

python-2.7

k-means

scikit-learn

silhouette

Jessica Martini

People also ask

1 Answers

Gambit1614

Recent Activity

Donate For Us