Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a large number of custom functions in ggplot in R using stat_function()

Tags:

r

ggplot2

The basic issue is that I'd like to figure out how to add a large number (1000) custom functions into the same figure in ggplot, using different values for the function coefficients. I have seen other questions about how to add two or three functions, but not 1000, and questions about adding in different functional forms, but not the same form with multiple values for the parameters...

The goal is to have stat_function draw the lines over using parameters values stored in a data frame, but with no actual data for x.

[The overall goal here is to show the large uncertainty in the model parameters of a non-linear regression from a small dataset, which translates into uncertainty associated with predictions from this data (which I'm trying to convince someone else is a bad idea). I often do this by plotting many lines built from the uncertainty in the model parameters, (a la Andrew Gelman's Multilevel Regression textbook).]

As an example, here is the plot in the base R graphics.

#The data
p.gap <- c(50,45,57,43,32,30,14,36,51)
p.ag <- c(43,24,52,46,28,17,7,18,29)
data <- as.data.frame(cbind(p.ag, p.gap))

#The model (using non-linear least squares regression):
fit.1.nls <- nls(formula=p.gap~beta1*p.ag^(beta2), start=list(beta1=5.065, beta2=0.6168))
summary(fit.1.nls)

#From the summary, I find the means and s.e's the two parameters, and develop their distributions:
beta1 <- rnorm(1000, 7.8945, 3.5689)
beta2 <- rnorm(1000, 0.4894, 0.1282)
coefs <- as.data.frame(cbind(beta1,beta2))

#This is the plot I want (using curve() and base R graphics):
plot(data$p.ag, data$p.gap, xlab="% agricultural land use",
     ylab="% of riparian buffer gap", xlim=c(0,130), ylim=c(0,130), pch=20, type="n")
for (i in 1:1000){curve(coefs[i,1]*x^(coefs[i,2]), add=T, col="grey")}
curve(coef(fit.1.nls)[[1]]*x^(coef(fit.1.nls)[[2]]), add=T, col="red")
points(data$p.ag, data$p.gap, pch=20)

I can plot the mean model function with the data in ggplot:

fit.mean <- function(x){7.8945*x^(0.4894)}
ggplot(data, aes(x=p.ag, y=p.gap)) +
  scale_x_continuous(limits=c(0,100), "% ag land use") +
  scale_y_continuous(limits=c(0,100), "% riparian buffer gap") +
  stat_function(fun=fit.mean, color="red") +
  geom_point()

But nothing I do draws multiple lines in ggplot. I can't seem to find any help on drawing the parameter values from of functions on the ggplot website, or on this site, which are both usually very helpful. Does this violate enough plotting theory that no one dares do this?

Any help is appreciated. Thank you!

like image 620
phalteman Avatar asked Nov 07 '13 21:11

phalteman


1 Answers

It is possible to collect multiple geoms or stats (and even other elements of a plot) into a vector or list and add that vector/list to the plot. Using this, the plyr package can be used to make a list of stat_function, one for each row of coefs

library("plyr")
coeflines <-
alply(as.matrix(coefs), 1, function(coef) {
  stat_function(fun=function(x){coef[1]*x^coef[2]}, colour="grey")
})

Then just add this to the plot

ggplot(data, aes(x=p.ag, y=p.gap)) +
  scale_x_continuous(limits=c(0,100), "% ag land use") +
  scale_y_continuous(limits=c(0,100), "% riparian buffer gap") +
  coeflines +
  stat_function(fun=fit.mean, color="red") +
  geom_point()

enter image description here

A couple of notes:

  • This is slow. It took a few minutes on my computer to draw. ggplot was not designed to be very efficient at handling circa 1000 layers.
  • This just addresses adding the 1000 lines. Per @Roland's comment, I don't know if this represents what you want/expect it to statistically.
like image 162
Brian Diggs Avatar answered Oct 17 '22 00:10

Brian Diggs