Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate Silhouette Score of the scipy's fcluster using scikit-learn silhouette score?

I am using scipy.cluster.hierarchy.linkage as a clustering algorithm and pass the result linkage matrix to scipy.cluster.hierarchy.fcluster, to get the flattened clusters, for various thresholds.

I would like to calculate the Silhouette score of the results and compare them to choose the best threshold and prefer not to implement it on my own but use scikit-learn's sklearn.metrics.silhouette_score. How can I rearrange my clustering results as an input to sklearn.metrics.silhouette_score?

like image 534
J.J Avatar asked Jan 10 '15 10:01

J.J


People also ask

How is silhouette score calculated?

The Silhouette Coefficient is calculated using the mean intra-cluster distance ( a ) and the mean nearest-cluster distance ( b ) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b) . To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

How does K means silhouette score?

Silhouette analysis The silhouette coefficient is a measure of how similar a data point is within-cluster (cohesion) compared to other clusters (separation). Select a range of values of k (say 1 to 10). Plot Silhouette coefficient for each value of K.

What is Silhouette score in clustering?

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters.

What is the indication of silhouette score of?

Silhouette score is used to evaluate the quality of clusters created using clustering algorithms such as K-Means in terms of how well samples are clustered with other samples that are similar to each other. The Silhouette score is calculated for each sample of different clusters.


1 Answers

You don't have to.

Results of fcluster can directly be fed into silhouette_score.

distmatrix1 = scipy.spatial.distance.squareform(distmatrix + distmatrix.T)
ddgm = scipy.cluster.hierarchy.linkage(distmatrix1, method="average")
nodes = scipy.cluster.hierarchy.fcluster(ddgm, 4, criterion="maxclust")
metrics.silhouette_score(distmatrix + distmatrix.T , nodes, metric='euclidean')
like image 144
mlworker Avatar answered Oct 06 '22 00:10

mlworker