Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of operations for stat_smooth and scale transformation

Tags:

r

ggplot2

I'm plotting some log-scaled data with an overlain linear fit line, like so:

d <- data.frame(x=1:10, y=10^(1:10 + rnorm(10)))
ggplot(d, aes(x=x, y=y)) + geom_point() + 
  geom_smooth(method="lm", se=FALSE) +
  scale_y_log10()

enter image description here

It looks like the linear regression line is being calculated on the transformed data, or else it would go directly through the last point. Is that true?

I seem to remember that this is addressed in the ggplot2 text, but I can't find it now.

like image 563
Drew Steen Avatar asked Jan 14 '23 06:01

Drew Steen


1 Answers

When ggplot renders a plot, it does so in the following order:

  1. Map variables to aesthetics (ie, for each layer, figure out which variable is associated with which aesthetic, etc.)
  2. Facet the datasets (make panels)
  3. Transform the scales (through any scale_ functions, typically)
  4. Compute the aesthetics (ie, compute the lm fit, in this case -- this is where stat_ functions come in, which are typically called through geom_ functions)
  5. Train scales (figure out what the overall plot dimensions should be)
  6. Map scales (figure out where each layer should fit in the overall plot)
  7. Render geoms.

So, scaling happens before the model is fit, and hence yes, the fit is being calculated on the transformed data.

like image 126
Kevin Ushey Avatar answered Jan 22 '23 07:01

Kevin Ushey