Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 shade area under density curve by group

Tags:

r

ggplot2

I have this dataframe:

set.seed(1)
x <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
y <- c(rep("site1", 50), rep("site2", 50))
xy <- data.frame(x, y)

And I have made this density plot:

library(ggplot2)
ggplot(xy, aes(x, color = y)) + geom_density()

enter image description here

For site1 I need to shade the area under the curve that > 1% of the data. For site2 I need to shade the area under the curve that < 75% of the data.

I'm expecting the plot to look something like this (photoshopped). Having been through stack overflow, I'm aware that others have asked how to shade part of the area under a curve, but I cannot figure out how to shade the area under a curve by group.

enter image description here

like image 930
luciano Avatar asked Dec 03 '13 16:12

luciano


2 Answers

Here is one way (and, as @joran says, this is an extension of the response here):

#  same data, just renaming columns for clarity later on
#  also, use data tables
library(data.table)
set.seed(1)
value <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
site  <- c(rep("site1", 50), rep("site2", 50))
dt    <- data.table(site,value)
#  generate kdf
gg <- dt[,list(x=density(value)$x, y=density(value)$y),by="site"]
#  calculate quantiles
q1 <- quantile(dt[site=="site1",value],0.01)
q2 <- quantile(dt[site=="site2",value],0.75)
# generate the plot
ggplot(dt) + stat_density(aes(x=value,color=site),geom="line",position="dodge")+
  geom_ribbon(data=subset(gg,site=="site1" & x>q1),
              aes(x=x,ymax=y),ymin=0,fill="red", alpha=0.5)+
  geom_ribbon(data=subset(gg,site=="site2" & x<q2),
              aes(x=x,ymax=y),ymin=0,fill="blue", alpha=0.5)

Produces this:

like image 111
jlhoward Avatar answered Sep 29 '22 09:09

jlhoward


The problem with @jlhoward's solution is that you need to manually add goem_ribbon for each group you have. I wrote my own ggplot stat wrapper following this vignette. The benefit of this is that it automatically works with group_by and facet and you don't need to manually add geoms for each group.

StatAreaUnderDensity <- ggproto(
  "StatAreaUnderDensity", Stat,
  required_aes = "x",
  compute_group = function(data, scales, xlim = NULL, n = 50) {
    fun <- approxfun(density(data$x))
    StatFunction$compute_group(data, scales, fun = fun, xlim = xlim, n = n)
  }
)

stat_aud <- function(mapping = NULL, data = NULL, geom = "area",
                    position = "identity", na.rm = FALSE, show.legend = NA, 
                    inherit.aes = TRUE, n = 50, xlim=NULL,  
                    ...) {
  layer(
    stat = StatAreaUnderDensity, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(xlim = xlim, n = n, ...))
}

Now you can use stat_aud function just like other ggplot geoms.

set.seed(1)
x <- c(rnorm(500, mean = 1), rnorm(500, mean = 3))
y <- c(rep("group 1", 500), rep("group 2", 500))
t_critical = 1.5

tibble(x=x, y=y)%>%ggplot(aes(x=x,color=y))+
  geom_density()+
  geom_vline(xintercept = t_critical)+
  stat_aud(geom="area",
           aes(fill=y),
           xlim = c(0, t_critical), 
              alpha = .2)

enter image description here

tibble(x=x, y=y)%>%ggplot(aes(x=x))+
  geom_density()+
  geom_vline(xintercept = t_critical)+
  stat_aud(geom="area",
           fill = "orange",
           xlim = c(0, t_critical), 
              alpha = .2)+
  facet_grid(~y)

enter image description here

like image 38
Lala La Avatar answered Sep 29 '22 09:09

Lala La