I wonder how one can add another layer of important and needed complexity to a matrix correlation heatmap like for example the p value after the manner of the significance level stars in addition to the R2 value (-1 to 1)?
It was NOT INTENDED in this question to put significance level stars OR the p values as text on each square of the matrix BUT rather to show this in a graphical out-of-the-box representation of significance level on each square of the matrix. I think only those who enjoy the blessing of INNOVATIVE thinking can win the applause to unravel this kind of solution in order to have the best way to represent that added component of complexity to our "half-of-the-truth matrix correlation heatmaps". I googled a lot but never seen a proper or I shall say an "eye-friendly" way to represent the significance level PLUS the standard color shades that reflect the R coefficient.
The reproducible data set is found here:
http://learnr.wordpress.com/2010/01/26/ggplot2-quick-heatmap-plotting/
The R code please find below:
library(ggplot2) library(plyr) # might be not needed here anyway it is a must-have package I think in R library(reshape2) # to "melt" your dataset library (scales) # it has a "rescale" function which is needed in heatmaps library(RColorBrewer) # for convenience of heatmap colors, it reflects your mood sometimes nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv") nba <- as.data.frame(cor(nba[2:ncol(nba)])) # convert the matrix correlations to a dataframe nba.m <- data.frame(row=rownames(nba),nba) # create a column called "row" rownames(nba) <- NULL #get rid of row names nba <- melt(nba) nba.m$value<-cut(nba.m$value,breaks=c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),include.lowest=TRUE,label=c("(-0.75,-1)","(-0.5,-0.75)","(-0.25,-0.5)","(0,-0.25)","(0,0.25)","(0.25,0.5)","(0.5,0.75)","(0.75,1)")) # this can be customized to put the correlations in categories using the "cut" function with appropriate labels to show them in the legend, this column now would be discrete and not continuous nba.m$row <- factor(nba.m$row, levels=rev(unique(as.character(nba.m$variable)))) # reorder the "row" column which would be used as the x axis in the plot after converting it to a factor and ordered now #now plotting ggplot(nba.m, aes(row, variable)) + geom_tile(aes(fill=value),colour="black") + scale_fill_brewer(palette = "RdYlGn",name="Correlation") # here comes the RColorBrewer package, now if you ask me why did you choose this palette colour I would say look at your battery charge indicator of your mobile for example your shaver, won't be red when gets low? and back to green when charged? This was the inspiration to choose this colour set.
The matrix correlation heatmap should look like this:
Hints and ideas to enhance the solution:
- This code might be useful to have an idea about the significance level stars taken from this website:
http://ohiodata.blogspot.de/2012/06/correlation-tables-in-r-flagged-with.html
R code:
mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " "))) # so 4 categories
- The significance level can be added as colour intensity to each square like alpha aesthetics but I don't think this will be easy to interpret and to capture
- Another idea would be to have 4 different sizes of squares corresponding to the stars, of course giving the smallest to the non significant and increases to a full size square if highest stars
- Another idea to include a circle inside those significant squares and the thickness of the line of the circle corresponds to the level of significance (the 3 remaining categories) all of them of one colour
- Same as above but fixing the line thickness while giving 3 colours for the 3 remaining significant levels
- May be you come up with better ideas, who knows?
Let's use the heatmaply package in R to plot a correlation heatmap using the heatmaply_cor( ) function. Correlation of the data is the input matrix with “Features” column as x and y axis parameters. Function: heatmaply_cor(x, limits = c(-1, 1), xlab, ylab, colors = cool_warm,k_row, k_col …)
Most basic heatmap with ggplot2 This is the most basic heatmap you can build with R and ggplot2 , using the geom_tile() function. Input data must be a long format where each row provides an observation.
A correlation heatmap is a graphical representation of a correlation matrix representing the correlation between different variables. The value of correlation can take any value from -1 to 1. Correlation between two random variables or bivariate data does not necessarily imply a causal relationship.
This is just an attempt to enhance towards the final solution, I plotted the stars here as indicator of the solution, but as I said the aim is to find a graphical solution that can speak better than the stars. I just used geom_point and alpha to indicate significance level but the problem that the NAs (that includes the non-significant values as well) will show up like that of three stars level of significance, how to fix that? I think that using one colour might be more eye-friendly when using many colors and to avoid burdening the plot with many details for the eyes to resolve. Thanks in advance.
Here is the plot of my first attempt:
or might be this better?!
I think the best till now is the one below, until you come up with something better !
As requested, the below code is for the last heatmap:
# Function to get the probability into a whole matrix not half, here is Spearman you can change it to Kendall or Pearson cor.prob.all <- function (X, dfr = nrow(X) - 2) { R <- cor(X, use="pairwise.complete.obs",method="spearman") r2 <- R^2 Fstat <- r2 * dfr/(1 - r2) R<- 1 - pf(Fstat, 1, dfr) R[row(R) == col(R)] <- NA R } # Change matrices to dataframes nbar<- as.data.frame(cor(nba[2:ncol(nba)]),method="spearman") # to a dataframe for r^2 nbap<- as.data.frame(cor.prob.all(nba[2:ncol(nba)])) # to a dataframe for p values # Reset rownames nbar <- data.frame(row=rownames(nbar),nbar) # create a column called "row" rownames(nbar) <- NULL nbap <- data.frame(row=rownames(nbap),nbap) # create a column called "row" rownames(nbap) <- NULL # Melt nbar.m <- melt(nbar) nbap.m <- melt(nbap) # Classify (you can classify differently for nbar and for nbap also) nbar.m$value2<-cut(nbar.m$value,breaks=c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),include.lowest=TRUE, label=c("(-0.75,-1)","(-0.5,-0.75)","(-0.25,-0.5)","(0,-0.25)","(0,0.25)","(0.25,0.5)","(0.5,0.75)","(0.75,1)")) # the label for the legend nbap.m$value2<-cut(nbap.m$value,breaks=c(-Inf, 0.001, 0.01, 0.05),label=c("***", "** ", "* ")) nbar.m<-cbind.data.frame(nbar.m,nbap.m$value,nbap.m$value2) # adding the p value and its cut to the first dataset of R coefficients names(nbar.m)[5]<-paste("valuep") # change the column names of the dataframe names(nbar.m)[6]<-paste("signif.") nbar.m$row <- factor(nbar.m$row, levels=rev(unique(as.character(nbar.m$variable)))) # reorder the variable factor # Plotting the matrix correlation heatmap # Set options for a blank panel po.nopanel <-list(opts(panel.background=theme_blank(),panel.grid.minor=theme_blank(),panel.grid.major=theme_blank())) pa<-ggplot(nbar.m, aes(row, variable)) + geom_tile(aes(fill=value2),colour="white") + scale_fill_brewer(palette = "RdYlGn",name="Correlation")+ # RColorBrewer package opts(axis.text.x=theme_text(angle=-90))+ po.nopanel pa # check the first plot # Adding the significance level stars using geom_text pp<- pa + geom_text(aes(label=signif.),size=2,na.rm=TRUE) # you can play with the size # Workaround for the alpha aesthetics if it is good to represent significance level, the same workaround can be applied for size aesthetics in ggplot2 as well. Applying the alpha aesthetics to show significance is a little bit problematic, because we want the alpha to be low while the p value is high, and vice verse which can't be done without a workaround nbar.m$signif.<-rescale(as.numeric(nbar.m$signif.),to=c(0.1,0.9)) # I tried to use to=c(0.1,0.9) argument as you might expect, but to avoid problems with the next step of reciprocal values when dividing over one, this is needed for the alpha aesthetics as a workaround nbar.m$signif.<-as.factor(0.09/nbar.m$signif.) # the alpha now behaves as wanted except for the NAs values stil show as if with three stars level, how to fix that? # Adding the alpha aesthetics in geom_point in a shape of squares (you can improve here) pp<- pa + geom_point(data=nbar.m,aes(alpha=signif.),shape=22,size=5,colour="darkgreen",na.rm=TRUE,legend=FALSE) # you can remove this step, the result of this step is seen in one of the layers in the above green heatmap, the shape used is 22 which is again a square but the size you can play with it accordingly
I hope that this can be a step forward to reach there! Please note:
- Some suggested to classify or cut the R^2 differently, ok we can do that of course but still we want to show the audience GRAPHICALLY the significance level instead of troubling the eye with the star levels. Can we ACHIEVE that in principle or not?
- Some suggested to cut the p values differently, Ok this can be a choice after failure of showing the 3 levels of significance without troubling the eye. Then it might be better to show significant/non-significant without levels
- There might be a better idea you come up with for the above workaround in ggplot2 for alpha and size aesthetics, hope to hear from you soon !
- The question is not answered yet, waiting for an innovative solution ! - Interestingly, "corrplot" package does it! I came up with this graph below by this package, PS: the crossed squares are not significant ones, level of signif=0.05. But how can we translate this to ggplot2, can we?!
-Or you can do circles and hide those non-significant? how to do this in ggplot2?!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With