Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change dendrogram labels in r

I have a dendrogram in R. It is based on hierachical clustering using hclust. I am colouring labels that are different in different colours, but when I try changing the labels of my dedrogram (to the rows of the dataframe the cluster is based on) using dendrogram = dendrogram %>% set("labels", dataframe$column) the labels are replaced, but in the wrong positions. As example:

My dendrogram looks like this:

 ___|___
|      _|_
|     |   | 
|     1   0
2

when I now try changing the labels like specified above, the labels are changed, but they are applied from left to right in their order in the dataframe. If we assume my original dataframe looks like this

df:
   Column1  Column2
0     1        A
1     2        B
2     3        C

what I want to have is this:

    ___|___
   |      _|_
   |     |   | 
   |     B   A
   C

But what I actually get is:

    ___|___
   |      _|_
   |     |   | 
   |     B   C
   A   

the clustering of the data and their transformation into dendrogram was done as follows:

> d <- stringdistmatrix(df$Column1, df$Column1)
> cl <- hclust(as.dist(d))
> dend = as.dendrogram(cl)

Can anybody tell me how I can label my dendrogram with the values of another column based on the index?

like image 590
sequence_hard Avatar asked Nov 09 '15 14:11

sequence_hard


People also ask

How do I add labels to Hclust in R?

To receive the labels you need to assign them first using clusters$labels <- c("A","B","C","D") or you can assign with the rownames, once your labels are assigned you will no longer see the numbers you will able to see the names/labels.

How do you do a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)

How do you select clusters in dendrogram?

In the dendrogram locate the largest vertical difference between nodes, and in the middle pass an horizontal line. The number of vertical lines intersecting it is the optimal number of clusters (when affinity is calculated using the method set in linkage).


2 Answers

The dendextend package allows you to directly update dendrograms (as well as hclust), by using the following:

x <- c(1:5)
dend <- as.dendrogram(hclust(dist(x)))

if(!require(dendextend)) install.packages("dendextend")
library("dendextend")

labels(dend)
labels(dend) <- c(21:25)
labels(dend)
like image 127
Tal Galili Avatar answered Oct 19 '22 05:10

Tal Galili


In the hclust object you've created, cl, you have an element named "order" that contains the order in which the elements are in the dendrogram.

If you want to change the labels, you need to put the new labels in the same order (cl$order), so the "new" dendrogram is right:

df$column2[cl$order]
like image 5
Cath Avatar answered Oct 19 '22 05:10

Cath