Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Silhouette plot in R

Tags:

r

I have a set of data containing: item, associated cluster, silhouette coefficient. I can further augment this data set with more information if necessary.

I would like to generate a silhouette plot in R. I am having trouble with this because examples I came across use the built-in kmeans (or related) clustering function and plot the result. I want to bypass this step and produce the plot for my own clustering algorithm but I'm ending up short on providing the correct arguments to the plot function.

Thank you.

EDIT

Data set example https://pastebin.mozilla.org/8853427

What I've tried is loading the dataset and passing it to the plot function using various arguments based on https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/silhouette.html

like image 267
andrei Avatar asked Nov 30 '15 12:11

andrei


1 Answers

Function silhouette in package cluster can do the plots for you. It just needs a vector of cluster membership (produced from whatever algorithm you choose) and a dissimilarity matrix (probably best to use the same one used in producing the clusters). For example:

library (cluster)
library (vegan)
data(varespec)
dis = vegdist(varespec)
res = pam(dis,3) # or whatever your choice of clustering algorithm is
sil = silhouette (res$clustering,dis) # or use your cluster vector
windows() # RStudio sometimes does not display silhouette plots correctly
plot(sil)

EDIT: For k-means (which uses squared Euclidean distance)

library (vegan)
library (cluster)
data(varespec)
dis = dist(varespec)^2
res = kmeans(varespec,3)
sil = silhouette (res$cluster, dis)
windows() 
plot(sil)
like image 108
Philip Perrin Avatar answered Nov 01 '22 01:11

Philip Perrin