Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can silhouette scores be negative?

If we have some datapoints:

enter image description here

And we use, for example, k-means to segment; are the resulting segments not such that every point is closest to the center-of-mass of its respective cluster? And if so, when silhouette score compares ai (average distance to intra-cluster points) vs bi (average distance to extra-cluster points), how can it ever be the case that the score is negative, or that bi is less than ai?

I can see maybe for different classification algorithms, some more sophisticated ones may cluster differently, or some points are assigned incorrectly. But how does this happen for k-means?

like image 690
zliangmd Avatar asked Aug 28 '20 19:08

zliangmd


People also ask

What does a negative silhouette plot value mean?

A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster.

What does a high silhouette score mean?

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters.

How do you read a silhouette score?

The value of the silhouette coefficient is between [-1, 1]. A score of 1 denotes the best meaning that the data point i is very compact within the cluster to which it belongs and far away from the other clusters. The worst value is -1. Values near 0 denote overlapping clusters.

How does silhouette method work?

The silhouette method computes silhouette coefficients of each point that measure how much a point is similar to its own cluster compared to other clusters. by providing a succinct graphical representation of how well each object has been classified.


1 Answers

A point i's average distance to points in a cluster is not the same as its distance to the center-of-mass of that cluster. Silhouette score uses the former when calculating a(i) and b(i), while k-means uses the latter in cluster assignment, so there may be disagreement.

For example, in the image below: suppose the blue points are already assigned to one cluster and the green points to another. To which cluster will the red point be assigned? The center-of-mass of the blue cluster is at (0, 1) and the center-of-mass of the green cluster is at (0, -1.15), so the red point will be assigned to the blue cluster. However, its average distance to the green points is 1.15 while its average distance to the blue points is 1.414, so it will get a negative silhouette score.

silhouette score negative example

like image 93
Burrito Avatar answered Dec 31 '22 19:12

Burrito