Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to expand the dendogram in heatmap.2

I have the following heatmap with the dendrogram like this.

The complete data are here.

The problem is that the dendrogram on the left is squished. How can I unsquish (expand) it without altering the column size of the heatmap?

enter image description here

It is generated with this code:

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust  <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

    # Read data
    dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


    rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
    dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")] 
    dat.bcd <- dat.bcd

    # Clustering and distance function
    hclustfunc <- function(x) hclust(x, method="complete")
    distfunc <- function(x) dist(x,method="maximum")


    # Select based on FC, as long as any of them >= anylim

    anylim <- 2.0
    dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


    # Clustering functions
    height <- clust.height; 

    # Define output file name
    heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


    # Compute distance and clusteirn function
    d.bcd <- distfunc(dat.bcd)
    fit.bcd <- hclustfunc(d.bcd)


    # Cluster by height
    #cutree and rect.huclust has to be used in tandem
    clusters <- cutree(fit.bcd, h=height) 
    nofclust.height <-  length(unique(as.vector(clusters)));

    myorder <- colnames(dat.bcd); 
    if (length(type.order)>0) {
     myorder <- type.order
    }

    # Define colors
    #hmcols <- rev(brewer.pal(11,"Spectral"));
    hmcols <- rev(redgreen(2750));
    selcol <- colorRampPalette(brewer.pal(12,"Set3"))
    selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
    sdcol= selcol(5);
    clustcol.height = selcol2(nofclust.height);

    # Plot heatmap
    pdf(file=heatout,width=20,height=50); # for FC.lim >=2
    heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
    dev.off();


}
#--------------------------------------------------
# ENd of functions 
#-------------------------------------------------- 

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);
like image 692
pdubois Avatar asked Dec 20 '22 17:12

pdubois


1 Answers

In your case the data has long tail, which is expected for gene expression data (lognormal).

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
                   header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

As you can see from the quantile distribution that the genes with the highest expression extend the range from 1.5 to above 300.

quantile(mat)

#     0%     25%     50%     75%    100% 
#  0.000   0.769   1.079   1.544 346.230 

When the hierarchical clustering is performed on unscaled data the resulting dendrogram may show bias towards the values with the highest expression, as seen in your example. This merits either a logarithmic or z-score transformation, among many (reference). Your dataset contains values == 0, which is a problem for log-transformation since log(0) is undefined.

Z-score transformation (reference) is implemented within heatmap.2, but it's important to note that the function computes the distance matrix and runs clustering algorithm before scaling the data. Hence the option scale='row' doesn't influence the clustering results, see my earlier post (differences in heatmap/clustering defaults in R) for more details.

I would propose that you scale your data before running heatmap.2:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat))) 

quantile(z)

#         0%        25%        50%        75%       100% 
# -2.1843994 -0.6646909 -0.2239677  0.3440102  2.2640027 

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5) 

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none', 
             hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
             margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
             lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
             RowSideColors=as.character(clusters))
dev.off()

Also, see the additional posts here and here, which explain how to set the layout of the heatmap via lmat, lwid and lhei parameters.

The resulting heatmap is shown below (row and column labels are omitted):

enter image description here

like image 103
TWL Avatar answered Jan 14 '23 11:01

TWL