Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2: geom_ribbon with alpha dependent on data density along y-axis for each x

Is there a way in ggplot2 to produce a geom_ribbon (or other area based geom) with a varying alpha based on the density of points?

The following code produces 50 noisy sine waves, with random x-values for each sample. I don't want to draw every single point as I might want a thousand or more resamples, so I'd like to summarise all these points.

A simple method would be to draw a geom_ribbon covering 95% quantiles. However, firstly this isn't that easy to calculate given the x-values aren't the same for each resample; normally you'd calculate the pointwise quantiles at each of the 100 x points.

Instead I'd like to have ribbon covering the entire area where samples are located, with a continuous alpha gradient, i.e. the ribbon would be darkest in the middle near the actual line and very light at the outlier points. Is this possible in ggplot2?

library(ggplot2)

num_points = 100
num_samples = 50

x = seq(0, 4*pi, length.out=num_points)

sim <- lapply(1:num_samples, function(f) {
    x = runif(num_points, 0, 4*pi)
    y = sin(x) + rnorm(num_points, 0, 0.4)
    data.frame(x=x, y=y)
})

sim.df <- do.call(rbind, sim)
actual = data.frame(x=x, y=sin(x))

ggplot(sim.df, aes(x=x, y=y)) +
    geom_point(alpha=0.7) +
    geom_line(data=actual, colour='blue', size=1.5) 

Plot of noisy sinewaves

like image 846
Stuart Lacy Avatar asked May 19 '16 14:05

Stuart Lacy


1 Answers

One option is to use quantile regression to get the y-values for each quantile at each x-value and then plot those using geom_ribbon.

library(splines)
library(quantreg)
library(reshape2)
library(dplyr)
  1. Set quantiles for density ribbons:

    nq = 50 # Numbre of quantiles
    qq = seq(0,1, length.out=nq)
  2. Run the quantile regression for each quantile. I've used a flexible spline function to get a good fit to the sine function:

    m1 = rq(y ~ ns(x,10), data=sim.df, tau=qq)
  3. Create data frame for use by geom_ribbon to plot density quantiles.

    Create a data frame of regression quantile predictions using predict:

    xvals = seq(min(sim.df$x), max(sim.df$x), length.out=100)
    rqs = data.frame(x=xvals, predict(m1, newdata=data.frame(x=xvals)))
    names(rqs) = c("x", paste0("p",100*qq))

    Reshape the data so that the predictions for each quantile serve as the ymax for one quantile and the ymin for the next quantile in succession (with the exception that the first quantile serves only once as the first ymin and the last quantile serves only once as the last ymax). Put the data in long format so that we can group by quantile in ggplot:

    dat1 = rqs[, -length(rqs)]
    names(dat1)[-1] = paste0(names(dat1)[-1])
    dat2 = rqs[, -2]
    names(dat2)[-1] = paste0(names(dat1)[-1])
    
    dat1 = melt(dat1, id.var="x")
    names(dat1) = c("x","group","min")
    dat2 = melt(dat2, id.var="x")
    names(dat2) = c("x","group1","max")
    
    dat = bind_cols(dat1, dat2)
  4. Now create the plot. We map the quantiles to the alpha aesthetic, and then use scale_alpha_manual to set the alpha values to be higher for quantiles closer to 0.5 and lower for quantiles closer to 0 and 1:

    ggplot() +
      geom_point(data=sim.df, aes(x,y), alpha=0.1, size=0.5, colour="red") +
      geom_ribbon(data=dat, aes(x=x, ymin=min, ymax=max, group=group, alpha=group), 
              fill="blue", lwd=0, show.legend=FALSE) +
      theme_bw() +
      scale_alpha_manual(values=c(seq(0.05,0.9,length.out=floor(0.5*length(qq))),
                                  seq(0.9,0.05,length.out=floor(0.5*length(qq)))))

enter image description here

Here's another example, but with data that has a varying standard deviation:

sim <- lapply(1:num_samples, function(f) {
  x = runif(num_points, 0, 4*pi)
  y = sin(x) + rnorm(num_points, 0, abs(0.7*cos(x))+0.1)
  data.frame(x=x, y=y)
})

sim.df <- do.call(rbind, sim)

Now just run all of the code we created earlier to get this plot:

enter image description here

like image 106
eipi10 Avatar answered Sep 23 '22 16:09

eipi10