Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot of a correlation matrix in R like in Excel example

Tags:

plot

r

I have been trying to minimize my use of Excel in favor of R, but am still stuck when it comes to display simple data cells as is often needed as the last step of an analysis. The following example is one I would like to crack, as it would help me switch to R for this critical part of my workflow.

I would like to illustrate the following correlation matrix in R :

matrix_values <- c(
  NA,1.54,1.63,1.15,0.75,0.78,1.04,1.2,0.94,0.89,
  17.95,1.54,NA,1.92,1.03,0.78,0.89,0.97,0.86,1.27,
  0.95,25.26,1.63,1.92,NA,0.75,0.64,0.61,0.9,0.88,
  1.18,0.74,15.01,1.15,1.03,0.75,NA,1.09,1.03,0.93,
  0.93,0.92,0.86,23.84,0.75,0.78,0.64,1.09,NA,1.2,
  1.01,0.85,0.9,0.88,30.4,0.78,0.89,0.61,1.03,1.2,
  NA,1.17,0.86,0.95,1.02,17.64,1.04,0.97,0.9,0.93,
  1.01,1.17,NA,0.94,1.09,0.93,17.22,1.2,0.86,0.88,
  0.93,0.85,0.86,0.94,NA,0.95,0.96,24.01,0.94,1.27,
  1.18,0.92,0.9,0.95,1.09,0.95,NA,1.25,21.19,0.89,
  0.95,0.74,0.86,0.88,1.02,0.93,0.96,1.25,NA,18.14)
cor_matrix <- matrix(matrix_values, ncol = 10, nrow = 11)

item_names <- c('Item1','Item2','Item3','Item4','Item5',
                'Item6','Item7','Item8','Item9','Item10')
colnames(cor_matrix) <- item_names
rownames(cor_matrix) <- c(item_names, "Size")

The cells should be colored based on their rank (e.g. >95 percentile is completely green, <5 percentile is completely red). The last row should be illustrated by a horizontal bar (representing the fraction of the maximum value).

I have made in Excel the output that I would like to have: correlation matrix

Ideally, I would also like to highlight correlation groups (either manually or by script), like in the following illustration: correlation matrix with highlights

like image 787
nassimhddd Avatar asked Jul 20 '12 15:07

nassimhddd


3 Answers

Your correlation matrix has several values greater than 1, which is not possible. But anyhow...

Try this one

library(reshape2)
dat <- melt(cor_matrix[-11, ])

library(ggplot2)
p <- ggplot(data =  dat, aes(x = Var1, y = Var2)) +
  geom_tile(aes(fill = value), colour = "white") +
  geom_text(aes(label = sprintf("%1.2f",value)), vjust = 1) +
  scale_fill_gradient(low = "white", high = "steelblue")

print(p)

enter image description here

like image 85
MYaseen208 Avatar answered Oct 20 '22 12:10

MYaseen208


Myaseen208 has a good start on the answer. I thought I'd fill in a few more pieces: getting color gradient in the red/green you specified, flipping the order of the y-axis, and cleaning up a few other points (gray background and legend).

library("reshape2")
library("ggplot2")

cor_dat <- melt(cor_matrix[-11,])
cor_dat$Var1 <- factor(cor_dat$Var1, levels=item_names)
cor_dat$Var2 <- factor(cor_dat$Var2, levels=rev(item_names))
cor_dat$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))

ggplot(data =  cor_dat, aes(x = Var1, y = Var2)) +
  geom_tile(aes(fill = pctile), colour = "white") +
  geom_text(aes(label = sprintf("%1.1f",value)), vjust = 1) +
  scale_fill_gradientn(colours=c("red","red","white","green","green"),
                       values=c(0,0.05,0.5,0.95,1),
                       guide = "none", na.value = "white") +
  coord_equal() +
  opts(axis.title.x = theme_blank(),
       axis.title.y = theme_blank(),
       panel.background = theme_blank())

enter image description here

EDIT:

Now attempting to get the blue size bars at the bottom.

What makes the size bars harder is that they are a completely different representation of different data than the correlation matrix. So I will first try and make just that part separate and then work on putting them together.

Like with the cor data, first the size data is extracted from the matrix and then turned into a data.frame that has the useful values, including the fraction of the total.

size_dat <- melt(cor_matrix[11,,drop=FALSE])
size_dat$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat$frac <- size_dat$value / max(size_dat$value)

ggplot(data=size_dat, aes(x=Var2, y=Var1)) +
  geom_blank() +
  geom_rect(aes(xmin = as.numeric(Var2) - 0.5, 
                xmax = as.numeric(Var2) - 0.5 + frac),
            ymin = -Inf, ymax = Inf, fill="blue", color="white")  +
  coord_equal() +
  opts(axis.title.x = theme_blank(),
       axis.title.y = theme_blank(),
       panel.background = theme_blank())

The geom_rect call uses some tricks such as using the numeric representation of the categorical (discrete) variable to position things carefully. Each "item" goes from 0.5 below it to 0.5 above it. So the left edge of the rectangle is 0.5 below the item value, and the right edge is frac to the right of that. Using Inf and -Inf for the y limits means go to the extreme of the plot. This gives

enter image description here

Now to try and put them together. The x scale is common, and the y scales can be made common (though disjoint). Playing with levels and orders is necessary. Also, I flipped x and y in the original (which is fine since it is symmetric). Since the data sets are extracted and formatted a little differently, I've renamed them.

cor_dat2 <- melt(cor_matrix[-(nrow(cor_matrix),])
cor_dat2$Var1 <- factor(cor_dat$Var1, levels=rev(c(item_names, "Size")))
cor_dat2$Var2 <- factor(cor_dat$Var2, levels=item_names)
cor_dat2$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))

size_dat2 <- melt(cor_matrix["Size",,drop=FALSE])
size_dat2$Var1 <- factor(size_dat$Var1, levels=rev(c(item_names, "Size")))
size_dat2$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat2$frac <- size_dat$value / max(size_dat$value)

ggplot(data = cor_dat2, aes(x = Var2, y = Var1)) +
  geom_tile(aes(fill = pctile), colour = "white") +
  geom_text(aes(label = sprintf("%1.1f",value))) +
  geom_rect(data=size_dat2,
            aes(xmin = as.numeric(Var2) - 0.5, 
                xmax = as.numeric(Var2) - 0.5 + frac,
                ymin = as.numeric(Var1) - 0.5,
                ymax = as.numeric(Var1) + 0.5),
            fill="lightblue", color="white")  +
  geom_text(data=size_dat2, 
            aes(x=Var2, y=Var1, label=sprintf("%.0f", value))) +
  scale_fill_gradientn(colours=c("red","red","white","green","green"),
                       values=c(0,0.05,0.5,0.95,1),
                       guide = "none", na.value = "white") +
  scale_y_discrete(drop = FALSE) +
  coord_equal() +
  opts(axis.title.x = theme_blank(),
       axis.title.y = theme_blank(),
       panel.background = theme_blank())

enter image description here

This final version does not assume that it is a 10x10 correlation with an additional row. It can be any number. cor_matrix must have the right names (and "Size" has to be the last row) and item_names must contain the list of items. But it doesn't have to be 10.

like image 36
Brian Diggs Avatar answered Oct 20 '22 12:10

Brian Diggs


Here is an approach using base graphics:

par(mar=c(1, 5, 5, 1))
plot.new()
plot.window(xlim=c(0, 10), ylim=c(0, 11))

quant_vals <- findInterval(cor_matrix[-11, ], 
                           c(-Inf, quantile(cor_matrix[-11, ],
                                            c(0.05, 0.25, 0.45, 0.55, 0.75, 0.95), 
                                            na.rm=TRUE), 
                             Inf))
quant_vals[is.na(quant_vals)] <- 4
cols <- c('#ff0000', '#ff6666', '#ffaaaa', '#ffffff', '#aaffaa', 
          '#66ff66', '#00ff00')
colmat <- matrix(cols[quant_vals], ncol=10, nrow=10)

rasterImage(colmat, 0, 1, 10, 11, interpolate=FALSE)
for (i in seq_along(cor_matrix[11, ])) {
  rect(i - 1, 0.1, i - 1 + cor_matrix[11, i]/max(cor_matrix[11, ]), 0.9, 
       col='lightsteelblue3')
}

text(col(cor_matrix) - 0.5, 11.5 - row(cor_matrix), cor_matrix, font=2)
rect(0, 1, 10, 11)
rect(0, 0, 10, 1)
axis(2, at=(11:1) - 0.5, labels=rownames(cor_matrix), tick=FALSE, las=2)
axis(3, at=(1:10) - 0.5, labels=colnames(cor_matrix), tick=FALSE, las=2)

rect(0, 8, 3, 11, lwd=2)
rect(4, 4, 7, 7, lwd=2)
rect(8, 1, 10, 3, lwd=2)

enter image description here

Data

cor_matrix <- structure(c(NA, 1.54, 1.63, 1.15, 0.75, 0.78, 1.04, 1.2, 0.94, 
0.89, 17.95, 1.54, NA, 1.92, 1.03, 0.78, 0.89, 0.97, 0.86, 1.27, 
0.95, 25.26, 1.63, 1.92, NA, 0.75, 0.64, 0.61, 0.9, 0.88, 1.18, 
0.74, 15.01, 1.15, 1.03, 0.75, NA, 1.09, 1.03, 0.93, 0.93, 0.92, 
0.86, 23.84, 0.75, 0.78, 0.64, 1.09, NA, 1.2, 1.01, 0.85, 0.9, 
0.88, 30.4, 0.78, 0.89, 0.61, 1.03, 1.2, NA, 1.17, 0.86, 0.95, 
1.02, 17.64, 1.04, 0.97, 0.9, 0.93, 1.01, 1.17, NA, 0.94, 1.09, 
0.93, 17.22, 1.2, 0.86, 0.88, 0.93, 0.85, 0.86, 0.94, NA, 0.95, 
0.96, 24.01, 0.94, 1.27, 1.18, 0.92, 0.9, 0.95, 1.09, 0.95, NA, 
1.25, 21.19, 0.89, 0.95, 0.74, 0.86, 0.88, 1.02, 0.93, 0.96, 
1.25, NA, 18.14), .Dim = 11:10)
like image 3
Greg Snow Avatar answered Oct 20 '22 12:10

Greg Snow