Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add an external legend to ggpairs()?

Tags:

plot

r

ggally

I am plotting a scatterplot matrix using ggpairs. I am using the following code:

# Load required packages
require(GGally)

# Load datasets
data(state)
df <- data.frame(state.x77,
                 State = state.name,
                 Abbrev = state.abb,
                 Region = state.region,
                 Division = state.division
) 
# Create scatterplot matrix
p <- ggpairs(df, 
             # Columns to include in the matrix
             columns = c(3,5,6,7),

             # What to include above diagonal
             # list(continuous = "points") to mirror
             # "blank" to turn off
             upper = "blank",
             legends=T,

             # What to include below diagonal
             lower = list(continuous = "points"),

             # What to include in the diagonal
             diag = list(continuous = "density"),

             # How to label inner plots
             # internal, none, show
             axisLabels = "none",

             # Other aes() parameters
             colour = "Region",
             title = "State Scatterplot Matrix"
) 

# Show the plot
print(p)

and I get the following plot:

enter image description here

Now, one can easily see that I am getting legends for every plot in the matrix. I would like to have ONLY ONE universal legend for the whole plot. How do I do that? Any help would be much appreciated.

like image 305
Patthebug Avatar asked Apr 08 '14 18:04

Patthebug


2 Answers

I am working on something similar, this is the approach i would take,

  1. Ensure legends are set to 'TRUE' in the ggpairs function call
  2. Now iterate over the subplots in the plot matrix and remove the legends for each of them and just retain one of them since the densities are all plotted on the same column.

    colIdx <- c(3,5,6,7)
    
    for (i in 1:length(colIdx)) {
    
      # Address only the diagonal elements
      # Get plot out of matrix
      inner <- getPlot(p, i, i);
    
      # Add any ggplot2 settings you want (blank grid here)
      inner <- inner + theme(panel.grid = element_blank()) +
        theme(axis.text.x = element_blank())
    
      # Put it back into the matrix
      p <- putPlot(p, inner, i, i)
    
      for (j in 1:length(colIdx)){
        if((i==1 & j==1)){
    
          # Move legend right
          inner <- getPlot(p, i, j)
          inner <- inner + theme(legend.position=c(length(colIdx)-0.25,0.50)) 
          p <- putPlot(p, inner, i, j)
        }
        else{
    
          # Delete legend
          inner <- getPlot(p, i, j)
          inner <- inner + theme(legend.position="none")
          p <- putPlot(p, inner, i, j)
        }
      }
    }
    
like image 190
ManojVenkat Avatar answered Oct 19 '22 04:10

ManojVenkat


Hopefully, someone will show how this can be done with ggpairs(...). I'd like to see that myself. Until then, here is a solution that does not use ggpairs(...), but rather plain vanilla ggplot with facets.

library(ggplot2)
library(reshape2)   # for melt(...)
library(plyr)       # for .(...)
library(data.table)

xx <- with(df, data.table(id=1:nrow(df), group=Region, df[,c(3,5,6,7)]))
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
setkey(yy,id,group)
ww <- yy[,list(V=H,yval=xval),key="id,group"]
zz <- yy[ww,allow.cartesian=T]
setkey(zz,H,V,group)
zz <- zz[,list(id, group, xval, yval, min.x=min(xval),min.y=min(yval),
               range.x=diff(range(xval)),range.y=diff(range(yval))),by="H,V"]
d  <- zz[H==V,list(x=density(xval)$x,
                   y=min.y+range.y*density(xval)$y/max(density(xval)$y)),
         by="H,V,group"]
ggplot(zz)+
  geom_point(subset= .(xtfrm(H)<xtfrm(V)), 
             aes(x=xval, y=yval, color=factor(group)), 
             size=3, alpha=0.5)+
  geom_line(subset= .(H==V), data=d, aes(x=x, y=y, color=factor(group)))+
  facet_grid(V~H, scales="free")+
  scale_color_discrete(name="Region")+
  labs(x="", y="")

The basic idea is to melt(...) your df into the proper format for ggplot (xx), make two copies (yy and ww) and run a cartesian join based on id and group (here, id is just a row number and group is the Region variable), to create zz. We do need to calculate and scale the densities externally (in the data table d). In spite of all that, it still runs faster than ggpairs(...).

like image 42
jlhoward Avatar answered Oct 19 '22 05:10

jlhoward