Given a dendrogram y, which have k number of clusters under height value z, I would like to know:
How many observations were used to form the number of clusters (k)?
Here are some reproducible code, and pictures to illustrate the problem:
#Necessary packages to reproduce the code
library(ggplot2)
library(cluster)
#Example data
x = c(6.2, 2.3, 0, 1.54, 2.17, 6.11, 0.3, 1.39,
5.14, 12.52, 12.57, 7.13, 13.71, 11.42,
8.13, 8.86, 9.97, 10, 8.23, 12.4, 9.51,
20.56, 17.78, 14.91, 19.17, 17.48, 17.44,
21.32,
21.24)
y = c(7.89, 7.63, 5.29, 8.38, 8.37, 10.5, 21.5,
16.65, 23.76, 1.77, 1.8, 10.49, 14.01,
10.36, 10.85, 15.02, 14.91, 14.94, 10.76,
18.58, 23.12, 0, 13.59, 9.68, 17.32, 17.85,
17.79, 4.13, 4.05)
df = data.frame(cbind(x,y))
obs = NROW(df[,1]) #number of data observations
obs
[1] 29
#Clustering
agnes=agnes(df, metric="euclidean", stand=F, method="average")
k_number=sum(agnes$height < 1) #number of clusters under dendrogram's height value of 1
k_number
[1] 7 # k_number resulted in 7 groups/clusters
plot(agnes,which.plots=2)
Remarks in red were drawn outside R, and they indicate the 7 clusters grouped under height 1.
ggplot(df,aes(x,y)) + xlim(0,22) + ylim(0,25) +
geom_point() +
geom_text(aes(label=row.names(df)),hjust=0.5, vjust=-1.5, cex=5)
Ok, there are 7 clusters which come from 13 observations.
I would like to retrieve the number 13.
I have tried to read a lot of documentation, but since I'm not much familiar with R and clustering techniques I've struggled to find this out. Tks.
This should do the trick
# convert to hclust object and obtain cluster assignments for the observations
R> cl <- cutree(as.hclust(agnes), h=1)
R> cl
[1] 1 2 3 2 2 4 5 6 7 8 8 9 10 11 12 13 14 14 12 15 16 17 18 19 20
[26] 21 21 22 22
# find non-unique assignments
R> res <- table(cl)
R> res[res > 1]
cl
2 8 12 14 21 22
3 2 2 2 2 2
R> sum(res[res > 1])
[1] 13
Update: cut-off h=2
R> cl <- cutree(as.hclust(agnes), h=2)
R> cl
[1] 1 2 3 2 2 4 5 6 7 8 8 4 9 10 4 11 11 11 4 12 13 14 15 16 17
[26] 17 17 18 18
R> res <- table(cl)
R> res[res > 1]
cl
2 4 8 11 17 18
3 4 2 3 3 2
R> sum(res[res > 1])
[1] 17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With