This is a simple question but I couldn't find a clear and compelling answer anywhere. If I have a regression model with one or more interaction terms, like:
mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars)
coef(summary(mod1))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324
## factor(cyl)6 -3.775000 2.315925 -1.630018 0.1151545663620229670
## factor(cyl)8 -7.850000 1.957314 -4.010599 0.0004547582690011110
## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310
## factor(cyl)6:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851
## factor(cyl)8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068
what is a sure fire way of identifying which coefficient estimates are for interaction terms? The obvious way is to grep()
for the colon symbol in the term names. But let's assume for a second that's not possible because of something like:
mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl)))
mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324
## cyl2Cyl: 4 -3.775000 2.315925 -1.630018 0.1151545663620229670
## cyl2Cyl: 8 -7.850000 1.957314 -4.010599 0.0004547582690011110
## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310
## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851
## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068
I thought perhaps the terms()
object would be useful but it isn't. I could also probably make some assumption about the ordering/numbering of terms to get the intended result:
coef(summary(mod2))[5:6,]
## Estimate Std. Error t value Pr(>|t|)
## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385526
## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310693
but I don't know how to do that in a general way.
What can be done?
To understand potential interaction effects, compare the lines from the interaction plot: If the lines are parallel, there is no interaction. If the lines are not parallel, there is an interaction.
Adding interaction terms to a regression model has real benefits. It greatly expands your understanding of the relationships among the variables in the model. And you can test more specific hypotheses. But interpreting interactions in regression takes understanding of what each coefficient is telling you.
A common interaction term is a simple product of the predictors in question. For example, a product interaction between VARX and VARY can be computed and called INTXY with the following command. COMPUTE INTXY = VARX * VARY. The new predictors are then included in a REGRESSION procedure.
An interaction effect occurs when the effect of one variable depends on the value of another variable. Interaction effects are common in regression models, ANOVA, and designed experiments.
This seems a little convoluted, but could we just enumerate all the main effects and then take the set difference?
mod2 <- lm(mpg ~ cyl2 * factor(am) + wt * disp, data = mtcars)
variables <- labels(mod2)[attr(terms(mod2), "order") == 1]
factors <- sapply(names(mod2$xlevels), function(x) paste0(x, mod2$xlevels[[x]])[-1])
setdiff(colnames(model.matrix(mod2)), c("(Intercept)", variables, unlist(factors)))
# [1] "cyl2Cyl: 4:factor(am)1" "cyl2Cyl: 8:factor(am)1" "wt:disp"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With