I would like to use ggplot2
to make an upper triangle correlation matrix like this one. I can replicate that one just fine, but for some reason I'm stuck on really wanting to convert the reshape2
functions to tidyr
ones. I would think that I could use gather
in place of melt
, but that is not working.
reshape2
library(reshape2)
library(ggplot2)
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)
library(reshape2)
melted_cormat <- melt(cormat)
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
upper_tri <- get_upper_tri(cormat)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value)) +
geom_tile()
gather
from tidyr
.library(tidyverse)
#first correlatoin matrix
cor_base <- round(cor(mydata), 2)
#now UT
cor_base[lower.tri(cor_base)] <- NA
cor_tri <- as.data.frame(cor_base) %>%
rownames_to_column("Var2") %>%
gather(key = Var1, value = value, -Var2, na.rm = TRUE) %>%
as.data.frame()
ggplot(data = cor_tri, aes(x = Var2, y = Var1, fill = value)) +
geom_tile()
The values are all the same, but some change in order occurred that is making this look wrong. A check of identical
doesn't return TRUE
but the values of the two data frames seem to be the same...
> identical(cor_tri, melted_cormat)
[1] FALSE
> dim(cor_tri)
[1] 21 3
> dim(melted_cormat)
[1] 21 3
> sum(cor_tri == melted_cormat)
[1] 63
Any thoughts on this or should I just go ahead and load reshape2
to accomplish what I'm going for?
Thanks.
Essentially, it is the factor
and character
types of Var1 and Var2 between the reshape2 and tidyr versions. The former's melt()
retains factors and order of correlation matrix: "mpg", "disp", "hp", "drat", "wt", "qsec"
and latter's tibble:rownames_to_colums()
creates character types in alphabetical order: "disp", "drat", "hp", "mpg", "qsec", "wt"
. As seen both have different levels affecting plot rendering.
To resolve, consider a dplyr::mutate
line using base::factor(rownames(.), ...
) and explicitly define the levels as original arrangement of cor_base's row.names()
. Also, your Var1 and Var2 were reversed.
cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA
cor_tri <- as.data.frame(cor_base) %>%
mutate(Var1 = factor(row.names(.), levels=row.names(.))) %>%
gather(key = Var2, value = value, -Var1, na.rm = TRUE, factor_key = TRUE)
ggplot(data = cor_tri, aes(Var2, Var1, fill = value)) +
geom_tile()
Also, for you or future readers here is the base::reshape
version that too resolves above factor level issue:
cor_base <- round(cor(mydata), 2)
cor_base[lower.tri(cor_base)] <- NA
cor_base_df <- transform(as.data.frame(cor_base),
Var1 = factor(row.names(cor_base), levels=row.names(cor_base)))
cor_long <- subset(reshape(cor_base_df, idvar=c("Var1"),
varying = c(1:(ncol(cor_base_df)-1)), v.names="value",
timevar = "Var2",
times = factor(row.names(cor_base), levels=row.names(cor_base)),
new.row.names = 1:100,
direction = "long"), !is.na(value))
ggplot(data = cor_long, aes(Var2, Var1, fill = value)) +
geom_tile()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With