differences in heatmap/clustering defaults in R (heatplot versus heatmap.2)?

People also ask

Whats the difference between the dendrograms on the columns and rows of a heatmap?

Row dendrograms show the distance (or similarity) between rows and which nodes each row belongs to as a result of the clustering calculation. Column dendrograms show the distance (or similarity) between the variables (the selected cell value columns).

What package is heatmap 2 in R?

A heatmap is a graphical representation of data where the values are represented with colors. The heatmap. 2 function from the gplots package allows to produce highly customizable heatmaps.

What does clustering mean in heatmap?

Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. Heatmaps visualize a data matrix by drawing a rectangular grid corresponding to rows and columns in the matrix, and coloring the cells by their values in the data matrix.

What does a heatmap show in R?

A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. This page displays many examples built with R, both static and interactive. The heatmap() function is natively provided in R.

The main differences between heatmap.2 and heatplot functions are the following:

heatmap.2, as default uses euclidean measure to obtain distance matrix and complete agglomeration method for clustering, while heatplot uses correlation, and average agglomeration method, respectively.
heatmap.2 computes the distance matrix and runs clustering algorithm before scaling, whereas heatplot (when dualScale=TRUE) clusters already scaled data.
heatmap.2 reorders the dendrogram based on the row and column mean values, as described here.

Default settings (p. 1) can be simply changed within heatmap.2, by supplying custom distfun and hclustfun arguments. However p. 2 and 3 cannot be easily addressed, without changing the source code. Therefore heatplot function acts as a wrapper for heatmap.2. First, it applies necessary transformation to the data, calculates distance matrix, clusters the data, and then uses heatmap.2 functionality only to plot the heatmap with the above parameters.

The dualScale=TRUE argument in the heatplot function, applies only row-based centering and scaling (description). Then, it reassigns the extremes (description) of the scaled data to the zlim values:

z <- t(scale(t(data)))
zlim <- c(-3,3)
z <- pmin(pmax(z, zlim[1]), zlim[2])

In order to match the output from the heatplot function, I would like to propose two solutions:

I - add new functionality to the source code -> heatmap.3

The code can be found here. Feel free to browse through revisions to see the changes made to heatmap.2 function. In summary, I introduced the following options:

z-score transformation is performed prior to the clustering: scale=c("row","column")
the extreme values can be reassigned within the scaled data: zlim=c(-3,3)
option to switch off dendrogram reordering: reorder=FALSE

An example:

# require(gtools)
# require(RColorBrewer)
cols <- colorRampPalette(brewer.pal(10, "RdBu"))(256)

distCor <- function(x) as.dist(1-cor(t(x)))
hclustAvg <- function(x) hclust(x, method="average")

heatmap.3(data, trace="none", scale="row", zlim=c(-3,3), reorder=FALSE,
          distfun=distCor, hclustfun=hclustAvg, col=rev(cols), symbreak=FALSE)

enter image description here

II - define a function that provides all the required arguments to the heatmap.2

If you prefer to use the original heatmap.2, the zClust function (below) reproduces all the steps performed by heatplot. It provides (in a list format) the scaled data matrix, row and column dendrograms. These can be used as an input to the heatmap.2 function:

# depending on the analysis, the data can be centered and scaled by row or column. 
# default parameters correspond to the ones in the heatplot function. 
distCor <- function(x) as.dist(1-cor(x))
zClust <- function(x, scale="row", zlim=c(-3,3), method="average") {
    if (scale=="row") z <- t(scale(t(x)))
    if (scale=="col") z <- scale(x)
    z <- pmin(pmax(z, zlim[1]), zlim[2])
    hcl_row <- hclust(distCor(t(z)), method=method)
    hcl_col <- hclust(distCor(z), method=method)
    return(list(data=z, Rowv=as.dendrogram(hcl_row), Colv=as.dendrogram(hcl_col)))
}

z <- zClust(data)

# require(RColorBrewer)
cols <- colorRampPalette(brewer.pal(10, "RdBu"))(256)

heatmap.2(z$data, trace='none', col=rev(cols), Rowv=z$Rowv, Colv=z$Colv)

Few additional comments regarding heatmap.2(3) functionality:

symbreak=TRUE is recommended when scaling is applied. It will adjust the colour scale, so it breaks around 0. In the current example, the negative values = blue, while the positive values = red.
col=bluered(256) may provide an alternative colouring solution, and it doesn't require RColorBrewer library.

Related questions
                            
                                Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work) [duplicate]
                            
                                Convert HTML-table to Excel using VBA
                            
                                R ggplot2: Change the spacing between the legend and the panel
                            
                                dynamically add plots to web page using shiny
                            
                                Creating a density histogram in ggplot2?
                            
                                Warning message: "missing values in resampled performance measures" in caret train() using rpart
                            
                                where is a good tutorial on creating an R package with C++ source code? [closed]
                            
                                R CMD on Windows 7 Error - "R" is not recognized as an internal or external command,
                            
                                How to do multiple-line strings in R?
                            
                                R grep: Match one string against multiple patterns
                            
                                Why does as.matrix add extra spaces when converting numeric to character?
                            
                                R Packages - What is the file 'zzz.R' used for?
                            
                                How to set up conda-installed R for use with RStudio?
                            
                                Row/column counter in 'apply' functions
                            
                                How to index an element of a list object in R
                            
                                Install R packages using docker file
                            
                                setting seed locally (not globally) in R
                            
                                What does this mean: unable to find an inherited method for function ‘A’ for signature ‘"B"’
                            
                                suppress messages displayed by "print" instead of "message" or "warning" in R
                            
                                removing all the space between two ggplots combined with grid.arrange

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

differences in heatmap/clustering defaults in R (heatplot versus heatmap.2)?

Tags:

r

cluster-analysis

heatmap

bioconductor

hierarchical-clustering

People also ask

Recent Activity

Donate For Us