I did a hierarchical cluster for a project. I have 300 observations each of 20 variables. I indexed all the variables so that each variable is between 0 and 1, a larger value being better.
I used the following code to create a cluster plot.
d_data <- dist(all_data[,-1])
d_data_ind <- dist(data_ind[,-1])
hc_data_ind <- hclust(d_data_ind, method = "complete")
dend<- as.dendrogram(hc_data_ind)
plot(dend)
Now the labels of the nodes are in row names, the numbers 1 to 300 (see top pic). During the analysis, I removed the first column of the data frame which is labeled "geography" (see bottom pic), because they were city names in text and would screw up the analysis. But I really need to get the city names on the cluster plot in their right spots, because I need to choose a list of cities based on the results.
What code should I write to insert the city names in the "geography" column into this plot, corresponding to their row names?
As you can see from the data frame (bottom pic), all the city names are in alphabetical order, neatly in ascending order, just like the row names. I'm sure it isn't hard to put the city names onto the plot, I just can't find it by googling and asking around.
I think that what you are asking is "how can I decide on the labels in a dendrogram". So this has two parts. For example, let's use the simple data of the numbers c(1,2,5,6)
1) When you create the hclust using dist, it uses the names of the items. And if they don't exist then it uses a running index. For example:
x <- c(1,2,5,6)
d1 <- as.dendrogram(hclust(dist(x)))
plot(d1)
This is obviously a problem since the items we have are 1,2,5,6 and not 1:4! So how can we fix this? One way is update the names. For example:
x <- c(1,2,5,6)
names(x) <- x
x
d2 <- as.dendrogram(hclust(dist(x)))
plot(d2)
I believe this basically solves your problem (and frankly, doesn't require dendextend). But if you want to update the text AFTER creating the dendrogram - read on:
2) The dendextend package allows you to update the labels of a dendrogram. But you need to make sure you are using the correct order (since the order of the original vector, and that of the labels in the tree are not the same!). Here is how it can be done:
if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- c(1,2,5,6)
d3 <- as.dendrogram(hclust(dist(x)))
labels(d3) <- x[order.dendrogram(d3)]
plot(d3)
Here is how we would do it for a more complex data object (where we may not want to play with the row names of the object, but to update the dendrogram):
if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- CO2[,4:5]
d4 <- as.dendrogram(hclust(dist(x)))
labels(d4) <- apply(CO2[,1:3], 1, paste, collapse = "_")[order.dendrogram(d4)]
d4 <- set(d4, "labels_cex", 0.6)
d4 <- color_branches(d4, k = 3)
par(mar = c(3,0,0,6))
plot(d4, horiz = T)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With