In an effort to help populate the R tag here, I am posting a few questions I have often received from students. I have developed my own answers to these over the years, but perhaps there are better ways floating around that I don't know about.
The question: I just ran a regression with continuous y
and x
but factor f
(where levels(f)
produces c("level1","level2")
)
thelm <- lm(y~x*f,data=thedata)
Now I would like to plot the predicted values of y
by x
broken down by groups defined by f
. All of the plots I get are ugly and show too many lines.
My answer: Try the predict()
function.
##restrict prediction to the valid data
##from the model by using thelm$model rather than thedata
thedata$yhat <- predict(thelm,
newdata=expand.grid(x=range(thelm$model$x),
f=levels(thelm$model$f)))
plot(yhat~x,data=thethedata,subset=f=="level1")
lines(yhat~x,data=thedata,subset=f=="level2")
Are there other ideas out there that are (1) easier to understand for a newcomer and/or (2) better from some other perspective?
To understand potential interaction effects, compare the lines from the interaction plot: If the lines are parallel, there is no interaction. If the lines are not parallel, there is an interaction.
Interaction plots are used to understand the behavior of one variable depends on the value of another variable. Interaction effects are analyzed in regression analysis, DOE (Design of Experiments) and ANOVA (Analysis of variance).
The effects package has good ploting methods for visualizing the predicted values of regressions.
thedata<-data.frame(x=rnorm(20),f=rep(c("level1","level2"),10))
thedata$y<-rnorm(20,,3)+thedata$x*(as.numeric(thedata$f)-1)
library(effects)
model.lm <- lm(formula=y ~ x*f,data=thedata)
plot(effect(term="x:f",mod=model.lm,default.levels=20),multiline=TRUE)
Huh - still trying to wrap my brain around expand.grid()
. Just for comparison's sake, this is how I'd do it (using ggplot2):
thedata <- data.frame(predict(thelm), thelm$model$x, thelm$model$f)
ggplot(thedata, aes(x = x, y = yhat, group = f, color = f)) + geom_line()
The ggplot() logic is pretty intuitive, I think - group and color the lines by f. With increasing numbers of groups, not having to specify a layer for each is increasingly helpful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With