I want to generate graphs between variables (columns) that have a correlation above and below a certain point as well as having a pvalue < 0.01. The graphs would be ggplot2 (line or bar) graphs plotting the two columns (variables) that correlate. Here is the gist of my approach so far, with some dummy data, I would love a pointer in where to go next. <pre class="prettyprint"><code># Create some dummy data df <- data.frame(sample(1:50), sample(1:50), sample(1:50), sample(1:50)) colnames(df) <- c("var1", "var2", "var3", "var4") # Find correlations in the dummy data df.cor <- cor(df) # Make up some random pvalues for this example x <- 0:1000 df.cor.pvals <- data.frame(sample(x/1000, 4), sample(x/1000, 4), sample(x/1000, 4), sample(x/1000,4)) colnames(df.cor.pvals) <- c("var1", "var2", "var3", "var4") # Find the significant correlations df.cor.extreme <- ((df.cor < -0.01 | df.cor > 0.01) & df.cor.pvals < 0.5) # Ready data to for plotting df$rownames <- rownames(df) df.melt <- melt(df, id="rownames") # I want to plot the combinations of variables that have a TRUE value # in the df.cor.extreme matrix </code></pre> Below is hardcoded example if var1 and var2 had a value of TRUE. I assume this is where I need some sort of loop to generate multiple plots where varA and varB are correlated. <pre class="prettyprint"><code>ggplot(df.melt[(df.melt$variable=="var1" | df.melt$variable=="var2"),], aes(x=rownames, y=value, group=variable, colour=variable)) + geom_line() </code></pre> <img src="https://i.stack.imgur.com/YYpqD.png" alt="Example plot">

As said in the comment by @DrewSteen , p-avlue must be the same shape of cor. Here I supply a function that compute p-value matrix( it should exist a build-in function, in stats package) <pre class="prettyprint"><code>pvalue.matrix <- function(x,...){ ncx <- ncol(x) r <- matrix(0, nrow = ncx, ncol = ncx) for (i in seq_len(ncx)) { for (j in seq_len(i)) { x2 <- x[, i] y2 <- x[, j] r[i, j] <- cor.test(x2,y2,...)$p.value } } r <- r + t(r) - diag(diag(r)) rownames(r) <- colnames(x) colnames(r) <- colnames(x) r } </code></pre> Then you use the vectorize version of | and & like this <pre class="prettyprint"><code>df.cor.sig <- (df.cor > 0.01 | df.cor < -0.01) & pvalue.matrix(df) < 0.5 </code></pre> the plot is classic with geom_tile <pre class="prettyprint"><code>library(reshape2) ## melt library(plyr) ## round_any library(ggplot2) dat <- expand.grid(var1=1:4, var2=1:4) dat$value <- melt(df.cor.sig)$value dat$labels <- paste(round_any(df.cor,0.01) ,'(', round_any(pvalue.matrix(df),0.01),')',sep='') ggplot(dat, aes(x=var1,y=var2,label=labels))+ geom_tile(aes(fill = value),colour='white')+ geom_text() </code></pre> <img src="https://i.stack.imgur.com/ifl0e.png" alt="enter image description here"> <h3>Edit after OP clarification</h3> <pre class="prettyprint"><code>plots <- apply(dat,1,function(x){ plot.grob <- nullGrob() if(length(grep(pattern='TRUE',x[3])) >0 ){ gg <- paste('var',c(x[1],x[2]),sep='') p <- ggplot(subset(df.melt,variable %in% gg ), aes(x=rownames, y=value, group=variable, colour=variable)) + geom_line() plot.grob <- ggplotGrob(p) } plot.grob }) library(gridExtra) do.call(grid.arrange, plots) </code></pre> <img src="https://i.stack.imgur.com/K3X9Y.png" alt="enter image description here">

Generate graphs in R for certain correlations in a matrix

Q: Can we plot correlation matrix for categorical variables?

Yes, it is possible if you also keep the variable type in a column and you pick the appropriate correlation method based on the types.

Q: How do you graph a correlation table?

Plot using Heatmaps There are many ways you can plot correlation matrices one efficient way is using the heatmap. It is very easy to understand the correlation using heatmaps it tells the correlation of one feature(variable) to every other feature(variable).

Q: How do you do multiple correlations in R?

In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.

Tags:

r

ggplot2

plyr

reshape2

I want to generate graphs between variables (columns) that have a correlation above and below a certain point as well as having a pvalue < 0.01. The graphs would be ggplot2 (line or bar) graphs plotting the two columns (variables) that correlate.

Here is the gist of my approach so far, with some dummy data, I would love a pointer in where to go next.

# Create some dummy data
df <- data.frame(sample(1:50), sample(1:50), sample(1:50), sample(1:50))
colnames(df) <- c("var1", "var2", "var3", "var4")

# Find correlations in the dummy data
df.cor <- cor(df)

# Make up some random pvalues for this example
x <- 0:1000
df.cor.pvals <- data.frame(sample(x/1000, 4), sample(x/1000, 4), sample(x/1000, 4), sample(x/1000,4))
colnames(df.cor.pvals) <- c("var1", "var2", "var3", "var4")

# Find the significant correlations
df.cor.extreme <- ((df.cor < -0.01 | df.cor > 0.01) & df.cor.pvals < 0.5)

# Ready data to for plotting
df$rownames <- rownames(df)
df.melt <- melt(df, id="rownames")

# I want to plot the combinations of variables that have a TRUE value
# in the df.cor.extreme matrix

Below is hardcoded example if var1 and var2 had a value of TRUE. I assume this is where I need some sort of loop to generate multiple plots where varA and varB are correlated.

ggplot(df.melt[(df.melt$variable=="var1" | df.melt$variable=="var2"),], aes(x=rownames, y=value, group=variable, colour=variable)) +
  geom_line()

Example plot

841

asked Dec 28 '12 05:12

themartinmcfly

1 Answers

As said in the comment by @DrewSteen , p-avlue must be the same shape of cor.

Here I supply a function that compute p-value matrix( it should exist a build-in function, in stats package)

pvalue.matrix <- function(x,...){
  ncx <- ncol(x)
  r <- matrix(0, nrow = ncx, ncol = ncx)
  for (i in seq_len(ncx)) {
    for (j in seq_len(i)) {
      x2 <- x[, i]
      y2 <- x[, j]
      r[i, j] <-  cor.test(x2,y2,...)$p.value
    }
  }
  r <- r + t(r) - diag(diag(r))
  rownames(r) <- colnames(x)
  colnames(r) <- colnames(x)
  r
}

Then you use the vectorize version of | and & like this

df.cor.sig <- (df.cor > 0.01 | df.cor < -0.01) & pvalue.matrix(df) < 0.5

the plot is classic with geom_tile

library(reshape2) ## melt
library(plyr)     ## round_any
 library(ggplot2) 
dat <- expand.grid(var1=1:4, var2=1:4)
dat$value <- melt(df.cor.sig)$value
dat$labels <- paste(round_any(df.cor,0.01) ,'(', round_any(pvalue.matrix(df),0.01),')',sep='')
ggplot(dat, aes(x=var1,y=var2,label=labels))+ 
  geom_tile(aes(fill = value),colour='white')+
 geom_text()

enter image description here

Edit after OP clarification

plots <- apply(dat,1,function(x){
    plot.grob <- nullGrob()
    if(length(grep(pattern='TRUE',x[3])) >0 ){
      gg <- paste('var',c(x[1],x[2]),sep='')
      p <- ggplot(subset(df.melt,variable %in% gg ), 
            aes(x=rownames, y=value, group=variable, colour=variable)) +
            geom_line()
      plot.grob <- ggplotGrob(p)
    }
    plot.grob

})


library(gridExtra)
do.call(grid.arrange,  plots)

enter image description here

154

answered Sep 20 '22 00:09

agstudy

Related questions
                            
                                LaTeX formula in Shiny panel
                            
                                roxygen2 (Version 5.0) incorrectly creates documentation when #' occurs inside function
                            
                                Running python/bash code in Rstudio
                            
                                ggplot2 and Shiny: how to scale the size of legend with figure size?
                            
                                How to adjust x-axis using plot() when range is changing daily?
                            
                                r - How to specify the path in normalizePath, or get around this error associated with it?
                            
                                Setup R alert when long process is Finished
                            
                                variable scope in R tryCatch block: is <<- necessary to change local variable defined before tryCatch?
                            
                                prevent plot_ly reordering matrix
                            
                                Unexpected match of regex
                            
                                how to impute the distance to a value
                            
                                Performance benefits of chaining over ANDing when filtering a data table
                            
                                How to make a CRAN package to download data only once regardless of OS?
                            
                                How can I suppress the vertical gridlines in a ggplot2 plot while retaining the x-axis labels?
                            
                                How can I pass flags to R when it is compiling C++ code to be used in a package?
                            
                                Relative positioning of geom_text in ggplot2?
                            
                                Printing dataframes with long strings in R
                            
                                Merge data tables like data frames in R
                            
                                R flatten out list hierarchy to matrix or data.frame
                            
                                embedding a R animated 3D plot in PowerPoint

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With