Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extend ggplot2 boxplot with ggproto?

I'm often using boxplots in my work and like ggplot2 aesthetics. But standard geom_boxplot lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:

gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){
  require(ggplot2)
  vars <- all.vars(formula)
  response <- vars[1]
  factor <- vars[2]
  # A function for medians labelling
  fun_med <- function(x){
    return(data.frame(y = median(x), label = round(median(x), 3)))
  }
  p <- ggplot(data, aes_string(x = factor, y = response)) +
  stat_boxplot(geom = "errorbar", width = 0.6) +
  geom_boxplot() +
  stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3, 
                                                                         vjust = -0.1) +
  theme_grey(base_size = fsize, base_family = font)
  return(p)
}

There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:

gBoxplot(hwy ~ class, mpg)

plot1

Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on geom_boxplot. I've read the vignette Extending ggplot2, but cannot understand how to implement it. Any help will be appreciated.

like image 959
UlvHare Avatar asked Jan 19 '16 12:01

UlvHare


1 Answers

So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:

  1. A layer-function
  2. A stat-ggproto,
  3. A geom-ggproto

Only the layer-function need be visible to the user. You only need to write a stat-ggproto if you need some new way of transforming your data to make your primitive. And you only need write a geom-ggproto if you have some new grid-based graphics to create.

In this case, where we are basically composting layer-function that already exist, we don’t really need to write new ggprotos. It is enough to write a new layer-function. This layer-function will create the three layers that you already are using and map the parameters the way you intend. In this case:

  • Layer1 – uses geom_errorbar and stat_boxplot – to get our errorbars
  • Layer2 – uses geom_boxplot and stat_boxplot - to create the boxplots
  • Layer3 – users geom_label and stat_summary - to create the text labels with the mean value in the center of the boxes.

Of course you could write a new stat-ggproto and a new geom-ggproto that do all of these things at once. Or maybe you compost stat_summary and stat_boxplot into one, and the three geom-protos as well, and this do this with one layer. But there is little point unless we have efficiency problems.

Anyway, here is the code:

geom_myboxplot <- function(formula = NULL, data = NULL,
                           stat = "boxplot", position = "dodge",coef=1.5,
                           font = "sans", fsize = 18, width=0.6,
                           fun.data = NULL, fun.y = NULL, fun.ymax = NULL,
                           fun.ymin = NULL, fun.args = list(),
                           outlier.colour = NULL, outlier.color = NULL,
                           outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5,
                           notch = FALSE,  notchwidth = 0.5,varwidth = FALSE,
                           na.rm = FALSE, show.legend = NA,
                           inherit.aes = TRUE,...) {
    vars <- all.vars(formula)
    response <- vars[1]
    factor <- vars[2]
    mymap <- aes_string(x=factor,y=response)
    fun_med <- function(x) {
        return(data.frame(y = median(x), label = round(median(x), 3)))
    }
    position <- position_dodge(width)
    l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot,
            geom = "errorbar", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(na.rm = na.rm,
                coef = coef, width = width, ...))
    l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot,
            position = position, show.legend = show.legend, inherit.aes = inherit.aes,
            params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape,
                outlier.size = outlier.size, outlier.stroke = outlier.stroke,
                notch = notch, notchwidth = notchwidth, varwidth = varwidth,
                na.rm = na.rm, ...))
    l3 <- layer(data = data, mapping = mymap, stat = StatSummary,
            geom = "label", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(fun.data = fun_med,
                fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin,
                fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...))
    return(list(l1,l2,l3))
}

which allows you to create your customized boxplots it now like this:

ggplot(mpg) +
  geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+
  theme_grey(base_family = "sans",base_size = 18 )

And they look like this:

enter image description here

Note: we did not actually have to use the layer function, we could have used the orginal stat_boxplot, geom_boxplot, and stat_summary calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...

Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.

like image 177
Mike Wise Avatar answered Nov 01 '22 11:11

Mike Wise