Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shade (fill or color) area under density curve by quantile

Tags:

r

ggplot2

Suppose e.g. I want to shade the area under the density curve for the standard normal distribution by decile. I want the left-most 10% of the area to have a different shading to the next 10% and so on.

This is a variant on the questions "Shading a kernel density plot between two points" and "ggplot2 shade area under density curve by group", but I want to shade each quantile (in my example, each group is a decile but the process should easily generalise to other quantiles).

I don't mind whether a solution uses ggplot2 or base graphics, and whether this is done directly from a formula (which would be really neat) or based on making a data frame first. If the latter, you may want:

delta <- 0.0001 
z.df <- data.frame(x = seq(from=-3, to=3, by=delta))
z.df$pdf <- dnorm(z.df$x)
z.df$decile <- floor(10*pnorm(z.df$x) + 1)

Note that the naive solution ggplot(z.df, aes(x = x, fill = quantile)) + geom_ribbon(aes(ymin = 0, ymax = pdf)) would fail because Aesthetics can not vary with a ribbon.

like image 250
Silverfish Avatar asked Nov 28 '14 13:11

Silverfish


2 Answers

Actually aesthetics can vary with geom_ribbon(...) (or geom_area(...), which is basically the same thing), as long as you set the group aesthetic as well.

delta     <- 0.001 
quantiles <- 10
z.df     <- data.frame(x = seq(from=-3, to=3, by=delta))
z.df$pdf <- dnorm(z.df$x)
z.df$qt  <- cut(pnorm(z.df$x),breaks=quantiles,labels=F)

library(ggplot2)
ggplot(z.df,aes(x=x,y=pdf))+
  geom_area(aes(x=x,y=pdf,group=qt,fill=qt),color="black")+
  scale_fill_gradient2(midpoint=median(unique(z.df$qt)), guide="none") +
  theme_bw()

Setting quantiles <- 20 at the beginning produces this:

like image 166
jlhoward Avatar answered Nov 15 '22 11:11

jlhoward


Something that works and could generalise:

require(ggplot2)
g <- ggplot(z.df, aes(x=x, y=pdf, fill=decile)) +
    scale_fill_gradient2(midpoint=5.5, guide="none") +
    theme_bw()
for(n in 1:10) {
    g <- g + geom_ribbon(data=z.df[z.df$decile == n,], aes(ymin=0, ymax=pdf), colour = "black")
}
print(g)

I don't find this particularly satisfactory since (1) I have to add a ribbon for each decile, and (2) if I'm using a for loop in R I'm usually doing something wrong.

But the plot it gives is reasonable:

Normal distribution curve with shaded deciles

like image 43
Silverfish Avatar answered Nov 15 '22 12:11

Silverfish