Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting multiple correlation matrices by a categorical variable using ggcorrplot

I created a simple correlation matrix using the ggcorrplot package and following code:

library(ggcorrplot)
corr <- round(cor(data[,18:24], use = "complete.obs"),2)
gg <- ggcorrplot(corr)
print(gg)

What I would like to do is now create multiple correlation matrices using the same data but breaking it out by a categorical variable called "region" (column position '5'): similar to using the facet_wrap function. However, when I try to do that, I get an error. I've tried the following:

library(ggcorrplot)
corr <- round(cor(data[,18:24], use = "complete.obs"),2)
gg <- ggcorrplot(corr) +
facet_wrap("region", ncol = 2)
print(gg)

The error I get is "Error in combine_vars(data, params$plot_env, vars, drop = params$drop) : At least one layer must contain all variables used for facetting"

I understand that 'corr' is not referencing the "region" field, and I was wondering how I can accomplish this. So basically, the output would be 6 correlation matrices separated by "region" instead of just one correlation matrix for all of the data.

like image 721
A.G. Avatar asked Nov 07 '25 05:11

A.G.


1 Answers

This probably isn't possible using ggcorrplot, which takes as its input a correlation matrix and melts it into a suitable dataframe that is then used for some particular ggplot stuff to make the plot.

But you could use the ggcorrplot source code to get what you want.

As a preliminary step, let's look at a "melted" correlation matrix.

(small_cor <- cor(replicate(2, rnorm(25))))
#>            [,1]       [,2]
#> [1,] 1.00000000 0.06064063
#> [2,] 0.06064063 1.00000000
(reshape2::melt(small_cor))
#>   Var1 Var2      value
#> 1    1    1 1.00000000
#> 2    2    1 0.06064063
#> 3    1    2 0.06064063
#> 4    2    2 1.00000000

It's a dataframe version of a correlation matrix where each row is the correlation for a combination of variables from the original data. The

Now we'll get down to work with some sample data. There are 6 regions and 7 variables.

library(tidyverse)
library(reshape2)

my_data <- data.frame(region = factor(rep(1:6, each = 25)),
                      replicate(7, rnorm(6*25)))

We need the melted correlation matrices with the region IDs. Here's how I did it. There might be a nicer way. I think this might be the trickiest thing you'll have to do.

my_cors <- cbind(region = factor(rep(levels(my_data$region), each = 7^2)),
              do.call(rbind, lapply(split(my_data, my_data$region), function(x) melt(cor(x[,-1])))))

Now I will copy and paste from ggcorrplot source code. First, pasted from the argument list to get some defaults:

ggtheme = ggplot2::theme_minimal
colors = c("blue", "white", "red")
outline.color = "gray"
legend.title = "Corr"
tl.cex = 12
tl.srt = 45

Now I cut and paste the relevant parts of ggcorrplot and stick a facet_wrap at the end to get what you wanted.

my_cors %>% 
  ggplot(aes(Var1, Var2, fill = value)) + 
  geom_tile(color = outline.color) + 
  scale_fill_gradient2(low = colors[1], 
                       high = colors[3], 
                       mid = colors[2], 
                       midpoint = 0,
                       limit = c(-1, 1), 
                       space = "Lab", 
                       name = legend.title) + 
  ggtheme() + theme(axis.text.x = element_text(angle = tl.srt,
                                               vjust = 1, 
                                               size = tl.cex, hjust = 1), 
                    axis.text.y = ggplot2::element_text(size = tl.cex)) + 
  coord_fixed() +
  facet_wrap("region", ncol=2)

enter image description here

like image 67
ngm Avatar answered Nov 10 '25 22:11

ngm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!