Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill specific regions in geom_violin plot

Tags:

r

ggplot2

How can I fill a geom_violin plot in ggplot2 with different colors based on a fixed cutoff?

For instance, given the setup:

library(ggplot2)

set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
                  y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
dat$f <- with(dat,ifelse(y >= 0,'Above','Below'))

I'd like to take this basic plot:

ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y))

and simply have each violin colored differently above and below zero. The naive thing to try, mapping the fill aesthetic, splits and dodges the violin plots:

ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y, fill = f))

which is not what I want. I'd like a single violin plot at each x value, but with the interior filled with different colors above and below zero.

like image 724
joran Avatar asked Mar 24 '16 14:03

joran


1 Answers

Here's one way to do this.

library(ggplot2)
library(plyr)

#Data setup
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
                  y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))

First we'll use ggplot::ggplot_build to capture all the calculated variables that go into plotting the violin plot:

p <- ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y))
p_build <- ggplot2::ggplot_build(p)$data[[1]]

Next, if we take a look at the source code for geom_violin we see that it does some specific transformations of this computed data frame before handing it off to geom_polygon to draw the actual outlines of the violin regions.

So we'll mimic that process and simply draw the filled polygons manually:

#This comes directly from the source of geom_violin
p_build <- transform(p_build,
                     xminv = x - violinwidth * (x - xmin),
                     xmaxv = x + violinwidth * (xmax - x))

p_build <- rbind(plyr::arrange(transform(p_build, x = xminv), y),
                 plyr::arrange(transform(p_build, x = xmaxv), -y))

I'm omitting a small detail from the source code about duplicating the first row in order to ensure that the polygon is closed.

Now we do two final modifications:

#Add our fill variable
p_build$fill_group <- ifelse(p_build$y >= 0,'Above','Below')
#This is necessary to ensure that instead of trying to draw
# 3 polygons, we're telling ggplot to draw six polygons
p_build$group1 <- with(p_build,interaction(factor(group),factor(fill_group)))

And finally plot:

#Note the use of the group aesthetic here with our computed version,
# group1
p_fill <- ggplot() + 
    geom_polygon(data = p_build,
                 aes(x = x,y = y,group = group1,fill = fill_group))
p_fill

enter image description here

Note that in general, this will clobber nice handling of any categorical x axis labels. So you will often need to do the plot using a continuous x axis and then if you need categorical labels, add them manually.

like image 171
joran Avatar answered Oct 15 '22 19:10

joran