I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, anyone would help me?
Here is my data:
Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)
data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)
And here is my code of clustering:
cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)
And I will get a dendrogram after running the code above, while how can I draw a plot like this:
If I follow what you want, then we need a function to compute WSS
wss <- function(d) {
sum(scale(d, scale = FALSE)^2)
}
and a wrapper for this wss()
function
wrap <- function(i, hc, x) {
cl <- cutree(hc, i)
spl <- split(x, cl)
wss <- sum(sapply(spl, wss))
wss
}
This wrapper takes the following arguments, inputs:
i
the number of clusters to cut the data intohc
the hierarchical cluster analysis objectx
the original datawrap
then cuts the dendrogram into i
clusters, splits the original data into the cluster membership given by cl
and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.
We run all of this using sapply
over the number of clusters 1, 2, ..., nrow(data)
res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)
A screeplot can be drawn using
plot(seq_along(res), res, type = "b", pch = 19)
Here is an example using the well-known Edgar Anderson Iris data set:
iris2 <- iris[, 1:4] # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")
## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)
This gives:
We can zoom in by just showing the first 1:50 clusters
plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)
which gives
You can speed up the main computation step by either running the sapply()
via an appropriate parallelised alternative, or just do the computation for a fewer than nrow(data)
clusters, e.g.
res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With