Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I superimpose modified loess lines on a ggplot2 qplot?

Background

Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-)

My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that:

  1. The residuals don't systematically covary with the predictor
  2. The residuals are homoscedastic with respect to each predictor in the model

On point (2), my textbook has this to say:

Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131)

How can I modify loess lines?

I know how to generate a scatterplot with a "0-line,":

    # First, I'll make a simple linear model and get its diagnostic stats
    library(ggplot2)
    data(cars)
    mod <- fortify(lm(speed ~ dist, data = cars))
    attach(mod)
    str(mod)

    # Now I want to make sure the residuals are homoscedastic
    qplot (x = dist, y = .resid, data = mod) + 
    geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands

But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?

like image 623
briandk Avatar asked Mar 28 '10 00:03

briandk


3 Answers

Apology

Folks, I want to apologize for my ignorance. Hadley is absolutely right, and the answer was right in front of me all along. As I suspected, my question was born of statistical, rather than programmatic ignorance.

We get the 68% Confidence Interval for Free

geom_smooth() defaults to loess smoothing, and it superimposes the +1sd and -1sd lines as part of the deal. That's what Hadley meant when he said "Isn't that just a 68% confidence interval?" I just completely forgot that's what the 68% interval is, and kept searching for something that I already knew how to do. It didn't help that I'd actually turned the confidence intervals off in my code by specifying geom_smooth(se = FALSE).

What my Sample Code Should Have Looked Like

# First, I'll make a simple linear model and get its diagnostic stats.
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)

# Now I want to make sure the residuals are homoscedastic.
# By default, geom_smooth is loess and includes the 68% standard error bands.
qplot (x = dist, y = .resid, data = mod) + 
geom_abline(slope = 0, intercept = 0) +
geom_smooth() 

What I've Learned

Hadley implemented a really beautiful and simple way to get what I'd wanted all along. But because I was focused on loess lines, I lost sight of the fact that the 68% confidence interval was bounded by the very lines I needed. Sorry for the trouble, everyone.

like image 196
briandk Avatar answered Nov 02 '22 14:11

briandk


Could you calculate the +/- standard deviation values from the data and add a fitted curve of them to the plot?

like image 45
djq Avatar answered Nov 02 '22 14:11

djq


Have a look at my question "modify lm or loess function.."

I am not sure I followed your question very well, but maybe a:

+ stat_smooth(method=yourfunction)

will work, provided that you define your function as described here.

like image 34
dalloliogm Avatar answered Nov 02 '22 14:11

dalloliogm