This is an edited version of a previous question.
We are given an m by n table of n observations (samples) over m variables (genes, etc), and we are looking to study behavior of the variables between each pair of observations - For instance the two observations having the highest positive or negative correlation. For this purpose I have seen a great chart in Stadler et.al. Nature paper (2011):
Here it could be a sample dataset to be used.
m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m),
norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))
I have already tested gpairs(samples)
of package gpairs
that produces this one. It's a good start, but has no option to put correlation coefficients on the upper-right section, nor the density plots on the lower corner:
Next I used ggpairs(samples, lower=list(continuous="density"))
of package GGally
(Thanks @LucianoSelzer for a comment below). Now we have correlations on the upper corner and the densities on the lower corner, but we are missing the diagonal barplots, and the density plots are not heatmap shaped.
Any ideas to make the more closer to the desired picture (the first one)?
The pairwise comparison method (Saaty, 1980) is the most often used procedure for estimating criteria weights in GIS-MCA applications ( Malczewski, 2006a ). The method employs an underlying scale with values from 1 to 9 to rate the preferences with respect to a pair of criteria.
The pairwise comparisons are organized into a matrix: C = [ ckp] n × n; where ckp is the pairwise comparison rating for the k th and p th criteria. The matrix C is reciprocal; that is, c p k = c k p − 1, and all its diagonal elements are unity; that is, ckp = 1, for k = p.
This page titled 12.5: Pairwise Comparisons is shared under a Public Domain license and was authored, remixed, and/or curated by David Lane via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.
When we compare a sample with a theoretical distribution, we can use a Monte Carlo simulation to create a test statistics distribution. For instance, if we want to test whether a p-value distribution is uniformly distributed (i.e. p-value uniformity test) or not, we can simulate uniform random variables and compute the KS test statistic.
You could try to combine several different plotting methods and combine the results. Here's an example, which could be tweaked accordingly:
cors<-round(cor(samples),2) #correlations
# make layout for plot layout
laymat<-diag(1:5) #histograms
laymat[upper.tri(laymat)]<-6:15 #correlations
laymat[lower.tri(laymat)]<-16:25 #heatmaps
layout(laymat) #define layout using laymat
par(mar=c(2,2,2,2)) #define marginals etc.
# Draw histograms, tweak arguments of hist to make nicer figures
for(i in 1:5)
hist(samples[,i],main=names(samples)[i])
# Write correlations to upper diagonal part of the graph
# Again, tweak accordingly
for(i in 1:4)
for(j in (i+1):5){
plot(-1:1,-1:1, type = "n",xlab="",ylab="",xaxt="n",yaxt="n")
text(x=0,y=0,labels=paste(cors[i,j]),cex=2)
}
# Plot heatmaps, here I use kde2d function for density estimation
# image function for generating heatmaps
library(MASS)
for(i in 2:5)
for(j in 1:(i-1)){
k <- kde2d(samples[,i],samples[,j])
image(k,col=heat.colors(1000))
}
edit: Corrected indexing on the last loop.
You can do something like this using three different packages and two different functions as below:
cor_fun
is for the upper triangle correlative calculation.
my_fn
is for the lower triangle plotting
You also need ggpairs
.
library(GGally)
library(ggplot2)
library(RColorBrewer)
m <- 1000
samples <- data.frame(unif1 = runif(m), unif2 = runif(m, 1, 2), norm1 = rnorm(m),
norm2 = rnorm(m, 1), norm3 = rnorm(m, 0, 5))
cor_fun <- function(data, mapping, method="pearson", ndp=2, sz=5, stars=TRUE){ #ndp is to adjust the number of decimals
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
corr <- cor.test(x, y, method=method)
est <- corr$estimate
lb.size <- sz
if(stars){
stars <- c("***", "**", "*", "")[findInterval(corr$p.value, c(0, 0.001, 0.01, 0.05, 1))]
lbl <- paste0(round(est, ndp), stars)
}else{
lbl <- round(est, ndp)
}
ggplot(data=data, mapping=mapping) +
annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, size=lb.size)+
theme(panel.grid = element_blank(), panel.background=element_rect(fill="snow1"))
}
colfunc<-colorRampPalette(c("darkblue","cyan","yellow","red"))
my_fn <- function(data, mapping){
p <- ggplot(data = data, mapping = mapping) +
stat_density2d(aes(fill=..density..), geom="tile", contour = FALSE) +
scale_fill_gradientn(colours = colfunc(100)) + theme_classic()
}
ggpairs(samples, columns = c(1,2,3,4,5),
lower=list(continuous=my_fn),
diag=list(continuous=wrap("densityDiag", fill="gray92")), #densityDiag is a function
upper=list(continuous=cor_fun)) + theme(panel.background=element_rect(fill="white")) +
theme(axis.text.x = element_text(angle = 0, vjust = 1, color = "black")) +
theme(axis.text.y = element_text(angle = 0, vjust = 1 , color = "black"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With