Is there a way in ggplot2 to produce a geom_ribbon (or other area based geom) with a varying alpha based on the density of points?
The following code produces 50 noisy sine waves, with random x-values for each sample. I don't want to draw every single point as I might want a thousand or more resamples, so I'd like to summarise all these points.
A simple method would be to draw a geom_ribbon covering 95% quantiles. However, firstly this isn't that easy to calculate given the x-values aren't the same for each resample; normally you'd calculate the pointwise quantiles at each of the 100 x points.
Instead I'd like to have ribbon covering the entire area where samples are located, with a continuous alpha gradient, i.e. the ribbon would be darkest in the middle near the actual line and very light at the outlier points. Is this possible in ggplot2?
library(ggplot2)
num_points = 100
num_samples = 50
x = seq(0, 4*pi, length.out=num_points)
sim <- lapply(1:num_samples, function(f) {
x = runif(num_points, 0, 4*pi)
y = sin(x) + rnorm(num_points, 0, 0.4)
data.frame(x=x, y=y)
})
sim.df <- do.call(rbind, sim)
actual = data.frame(x=x, y=sin(x))
ggplot(sim.df, aes(x=x, y=y)) +
geom_point(alpha=0.7) +
geom_line(data=actual, colour='blue', size=1.5)
One option is to use quantile regression to get the y-values for each quantile at each x-value and then plot those using geom_ribbon
.
library(splines)
library(quantreg)
library(reshape2)
library(dplyr)
Set quantiles for density ribbons:
nq = 50 # Numbre of quantiles
qq = seq(0,1, length.out=nq)
Run the quantile regression for each quantile. I've used a flexible spline function to get a good fit to the sine function:
m1 = rq(y ~ ns(x,10), data=sim.df, tau=qq)
Create data frame for use by geom_ribbon to plot density quantiles.
Create a data frame of regression quantile predictions using predict
:
xvals = seq(min(sim.df$x), max(sim.df$x), length.out=100)
rqs = data.frame(x=xvals, predict(m1, newdata=data.frame(x=xvals)))
names(rqs) = c("x", paste0("p",100*qq))
Reshape the data so that the predictions for each quantile serve as the ymax
for one quantile and the ymin
for the next quantile in succession (with the exception that the first quantile serves only once as the first ymin
and the last quantile serves only once as the last ymax
). Put the data in long format so that we can group by quantile in ggplot:
dat1 = rqs[, -length(rqs)]
names(dat1)[-1] = paste0(names(dat1)[-1])
dat2 = rqs[, -2]
names(dat2)[-1] = paste0(names(dat1)[-1])
dat1 = melt(dat1, id.var="x")
names(dat1) = c("x","group","min")
dat2 = melt(dat2, id.var="x")
names(dat2) = c("x","group1","max")
dat = bind_cols(dat1, dat2)
Now create the plot. We map the quantiles to the alpha
aesthetic, and then use scale_alpha_manual
to set the alpha values to be higher for quantiles closer to 0.5 and lower for quantiles closer to 0 and 1:
ggplot() +
geom_point(data=sim.df, aes(x,y), alpha=0.1, size=0.5, colour="red") +
geom_ribbon(data=dat, aes(x=x, ymin=min, ymax=max, group=group, alpha=group),
fill="blue", lwd=0, show.legend=FALSE) +
theme_bw() +
scale_alpha_manual(values=c(seq(0.05,0.9,length.out=floor(0.5*length(qq))),
seq(0.9,0.05,length.out=floor(0.5*length(qq)))))
Here's another example, but with data that has a varying standard deviation:
sim <- lapply(1:num_samples, function(f) {
x = runif(num_points, 0, 4*pi)
y = sin(x) + rnorm(num_points, 0, abs(0.7*cos(x))+0.1)
data.frame(x=x, y=y)
})
sim.df <- do.call(rbind, sim)
Now just run all of the code we created earlier to get this plot:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With