Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pairwise graphical comparison of several distributions

This is an edited version of a previous question.

We are given an m by n table of n observations (samples) over m variables (genes, etc), and we are looking to study behavior of the variables between each pair of observations - For instance the two observations having the highest positive or negative correlation. For this purpose I have seen a great chart in Stadler et.al. Nature paper (2011):

enter image description here

Here it could be a sample dataset to be used.

m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m), 
                      norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))

I have already tested gpairs(samples) of package gpairs that produces this one. It's a good start, but has no option to put correlation coefficients on the upper-right section, nor the density plots on the lower corner:

enter image description here

Next I used ggpairs(samples, lower=list(continuous="density")) of package GGally (Thanks @LucianoSelzer for a comment below). Now we have correlations on the upper corner and the densities on the lower corner, but we are missing the diagonal barplots, and the density plots are not heatmap shaped.

enter image description here

Any ideas to make the more closer to the desired picture (the first one)?

like image 557
Ali Avatar asked Mar 19 '13 17:03

Ali


People also ask

What is pairwise comparison in GIS?

The pairwise comparison method (Saaty, 1980) is the most often used procedure for estimating criteria weights in GIS-MCA applications ( Malczewski, 2006a ). The method employs an underlying scale with values from 1 to 9 to rate the preferences with respect to a pair of criteria.

What is the formula for pairwise comparison?

The pairwise comparisons are organized into a matrix: C = [ ckp] n × n; where ckp is the pairwise comparison rating for the k th and p th criteria. The matrix C is reciprocal; that is, c p k = c k p − 1, and all its diagonal elements are unity; that is, ckp = 1, for k = p.

What is the license for pairwise comparisons?

This page titled 12.5: Pairwise Comparisons is shared under a Public Domain license and was authored, remixed, and/or curated by David Lane via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

How do you compare a sample with a theoretical distribution?

When we compare a sample with a theoretical distribution, we can use a Monte Carlo simulation to create a test statistics distribution. For instance, if we want to test whether a p-value distribution is uniformly distributed (i.e. p-value uniformity test) or not, we can simulate uniform random variables and compute the KS test statistic.


2 Answers

You could try to combine several different plotting methods and combine the results. Here's an example, which could be tweaked accordingly:

cors<-round(cor(samples),2) #correlations

# make layout for plot layout
laymat<-diag(1:5) #histograms
laymat[upper.tri(laymat)]<-6:15 #correlations
laymat[lower.tri(laymat)]<-16:25 #heatmaps

layout(laymat) #define layout using laymat

par(mar=c(2,2,2,2)) #define marginals etc.

# Draw histograms, tweak arguments of hist to make nicer figures
for(i in 1:5) 
  hist(samples[,i],main=names(samples)[i])

# Write correlations to upper diagonal part of the graph
# Again, tweak accordingly
for(i in 1:4)
  for(j in (i+1):5){
    plot(-1:1,-1:1, type = "n",xlab="",ylab="",xaxt="n",yaxt="n")
    text(x=0,y=0,labels=paste(cors[i,j]),cex=2)
    }

# Plot heatmaps, here I use kde2d function for density estimation
# image function for generating heatmaps
library(MASS)
for(i in 2:5)
  for(j in 1:(i-1)){
     k <- kde2d(samples[,i],samples[,j])
     image(k,col=heat.colors(1000))
    } 

edit: Corrected indexing on the last loop. pairwise plot

like image 65
Jouni Helske Avatar answered Oct 29 '22 02:10

Jouni Helske


You can do something like this using three different packages and two different functions as below:

cor_fun is for the upper triangle correlative calculation. my_fn is for the lower triangle plotting

You also need ggpairs.

library(GGally)
library(ggplot2)
library(RColorBrewer)

m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m), 
                      norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))

cor_fun <- function(data, mapping, method="pearson", ndp=2, sz=5, stars=TRUE){ #ndp is to adjust the number of decimals
  
  x <- eval_data_col(data, mapping$x)
  y <- eval_data_col(data, mapping$y)
  
  corr <- cor.test(x, y, method=method)
  est <- corr$estimate
  lb.size <- sz 
  
  if(stars){
    stars <- c("***", "**", "*", "")[findInterval(corr$p.value, c(0, 0.001, 0.01, 0.05, 1))]
    lbl <- paste0(round(est, ndp), stars)
  }else{
    lbl <- round(est, ndp)
  }
  
  ggplot(data=data, mapping=mapping) +
    annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, size=lb.size)+
    theme(panel.grid = element_blank(), panel.background=element_rect(fill="snow1")) 
  
}

colfunc<-colorRampPalette(c("darkblue","cyan","yellow","red"))

my_fn <- function(data, mapping){
  p <- ggplot(data = data, mapping = mapping) + 
    stat_density2d(aes(fill=..density..), geom="tile", contour = FALSE) +
    scale_fill_gradientn(colours = colfunc(100)) + theme_classic()
}


ggpairs(samples, columns = c(1,2,3,4,5),
        lower=list(continuous=my_fn),
        diag=list(continuous=wrap("densityDiag", fill="gray92")), #densityDiag is a function
        upper=list(continuous=cor_fun)) + theme(panel.background=element_rect(fill="white")) +
  theme(axis.text.x = element_text(angle = 0, vjust = 1, color = "black")) + 
  theme(axis.text.y = element_text(angle = 0, vjust = 1 , color = "black"))

enter image description here

like image 28
Apex Avatar answered Oct 29 '22 01:10

Apex