Pairwise graphical comparison of several distributions

Tags:

This is an edited version of a previous question.

We are given an m by n table of n observations (samples) over m variables (genes, etc), and we are looking to study behavior of the variables between each pair of observations - For instance the two observations having the highest positive or negative correlation. For this purpose I have seen a great chart in Stadler et.al. Nature paper (2011):

enter image description here

Here it could be a sample dataset to be used.

m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m), 
                      norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))

I have already tested gpairs(samples) of package gpairs that produces this one. It's a good start, but has no option to put correlation coefficients on the upper-right section, nor the density plots on the lower corner:

enter image description here

Next I used ggpairs(samples, lower=list(continuous="density")) of package GGally (Thanks @LucianoSelzer for a comment below). Now we have correlations on the upper corner and the densities on the lower corner, but we are missing the diagonal barplots, and the density plots are not heatmap shaped.

enter image description here

Any ideas to make the more closer to the desired picture (the first one)?

557

asked Mar 19 '13 17:03

Ali

2 Answers

You could try to combine several different plotting methods and combine the results. Here's an example, which could be tweaked accordingly:

cors<-round(cor(samples),2) #correlations

# make layout for plot layout
laymat<-diag(1:5) #histograms
laymat[upper.tri(laymat)]<-6:15 #correlations
laymat[lower.tri(laymat)]<-16:25 #heatmaps

layout(laymat) #define layout using laymat

par(mar=c(2,2,2,2)) #define marginals etc.

# Draw histograms, tweak arguments of hist to make nicer figures
for(i in 1:5) 
  hist(samples[,i],main=names(samples)[i])

# Write correlations to upper diagonal part of the graph
# Again, tweak accordingly
for(i in 1:4)
  for(j in (i+1):5){
    plot(-1:1,-1:1, type = "n",xlab="",ylab="",xaxt="n",yaxt="n")
    text(x=0,y=0,labels=paste(cors[i,j]),cex=2)
    }

# Plot heatmaps, here I use kde2d function for density estimation
# image function for generating heatmaps
library(MASS)
for(i in 2:5)
  for(j in 1:(i-1)){
     k <- kde2d(samples[,i],samples[,j])
     image(k,col=heat.colors(1000))
    }

edit: Corrected indexing on the last loop. pairwise plot

answered Oct 29 '22 02:10

Jouni Helske

You can do something like this using three different packages and two different functions as below:

cor_fun is for the upper triangle correlative calculation. my_fn is for the lower triangle plotting

You also need ggpairs.

library(GGally)
library(ggplot2)
library(RColorBrewer)

m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m), 
                      norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))

cor_fun <- function(data, mapping, method="pearson", ndp=2, sz=5, stars=TRUE){ #ndp is to adjust the number of decimals
  
  x <- eval_data_col(data, mapping$x)
  y <- eval_data_col(data, mapping$y)
  
  corr <- cor.test(x, y, method=method)
  est <- corr$estimate
  lb.size <- sz 
  
  if(stars){
    stars <- c("***", "**", "*", "")[findInterval(corr$p.value, c(0, 0.001, 0.01, 0.05, 1))]
    lbl <- paste0(round(est, ndp), stars)
  }else{
    lbl <- round(est, ndp)
  }
  
  ggplot(data=data, mapping=mapping) +
    annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, size=lb.size)+
    theme(panel.grid = element_blank(), panel.background=element_rect(fill="snow1")) 
  
}

colfunc<-colorRampPalette(c("darkblue","cyan","yellow","red"))

my_fn <- function(data, mapping){
  p <- ggplot(data = data, mapping = mapping) + 
    stat_density2d(aes(fill=..density..), geom="tile", contour = FALSE) +
    scale_fill_gradientn(colours = colfunc(100)) + theme_classic()
}


ggpairs(samples, columns = c(1,2,3,4,5),
        lower=list(continuous=my_fn),
        diag=list(continuous=wrap("densityDiag", fill="gray92")), #densityDiag is a function
        upper=list(continuous=cor_fun)) + theme(panel.background=element_rect(fill="white")) +
  theme(axis.text.x = element_text(angle = 0, vjust = 1, color = "black")) + 
  theme(axis.text.y = element_text(angle = 0, vjust = 1 , color = "black"))

enter image description here

answered Oct 29 '22 01:10

Apex

Related questions
                            
                                Using columns with special characters in formulae in R
                            
                                How to convert a character string date to date class if day value is missing
                            
                                Using grconvertX/grconvertY in ggplot2
                            
                                nrow(matrix) function
                            
                                How to compare characters vectors for partial matches in R
                            
                                Linebreaks with knitr [duplicate]
                            
                                Odd error in expand.grid()
                            
                                How do I replace values within a data frame with a string in R?
                            
                                Overplotting two SpatialPolygonsDataFrames with spplot
                            
                                Compute counting variable in dataframe
                            
                                Fastest way to take the weighted sum of the columns of a matrix in R
                            
                                Can Dynamic Time Warping be used on time series with missing values?
                            
                                Finding the best matching pairwise points from 2 vectors
                            
                                R external interface
                            
                                finding coordinate of sketch image (eg. scanned as photo format) in r or other software
                            
                                Can I list objects sorted by date?
                            
                                Passing directory path as parameter in R
                            
                                Function `[<-` will _replace_ an element, but not append an _element_
                            
                                R: rownames, colnames, dimnames and names in apply
                            
                                How can I get lists of rownames as values after aggregating a dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pairwise graphical comparison of several distributions

Tags:

r

heatmap

density-plot

ggally

Ali

People also ask

2 Answers

Jouni Helske

Apex

Recent Activity

Donate For Us