This is a simple question but I couldn't find a clear and compelling answer anywhere. If I have a regression model with one or more interaction terms, like: <pre class="prettyprint"><code>mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars) coef(summary(mod1)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324 ## factor(cyl)6 -3.775000 2.315925 -1.630018 0.1151545663620229670 ## factor(cyl)8 -7.850000 1.957314 -4.010599 0.0004547582690011110 ## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310 ## factor(cyl)6:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851 ## factor(cyl)8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068 </code></pre> what is a sure fire way of identifying which coefficient estimates are for interaction terms? The obvious way is to <code>grep()</code> for the colon symbol in the term names. But let's assume for a second that's not possible because of something like: <pre class="prettyprint"><code>mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl))) mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324 ## cyl2Cyl: 4 -3.775000 2.315925 -1.630018 0.1151545663620229670 ## cyl2Cyl: 8 -7.850000 1.957314 -4.010599 0.0004547582690011110 ## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310 ## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851 ## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068 </code></pre> I thought perhaps the <code>terms()</code> object would be useful but it isn't. I could also probably make some assumption about the ordering/numbering of terms to get the intended result: <pre class="prettyprint"><code>coef(summary(mod2))[5:6,] ## Estimate Std. Error t value Pr(>|t|) ## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385526 ## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310693 </code></pre> but I don't know how to do that in a general way. What can be done?

This seems a little convoluted, but could we just enumerate all the main effects and then take the set difference? <pre class="prettyprint"><code>mod2 <- lm(mpg ~ cyl2 * factor(am) + wt * disp, data = mtcars) variables <- labels(mod2)[attr(terms(mod2), "order") == 1] factors <- sapply(names(mod2$xlevels), function(x) paste0(x, mod2$xlevels[[x]])[-1]) setdiff(colnames(model.matrix(mod2)), c("(Intercept)", variables, unlist(factors))) # [1] "cyl2Cyl: 4:factor(am)1" "cyl2Cyl: 8:factor(am)1" "wt:disp" </code></pre>

Extract interaction terms from regression estimates

Tags:

r

regression

lm

This is a simple question but I couldn't find a clear and compelling answer anywhere. If I have a regression model with one or more interaction terms, like:

Click to copy

mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars)
coef(summary(mod1))
##                           Estimate Std. Error   t value              Pr(>|t|)
## (Intercept)              22.900000   1.750674 13.080673 0.0000000000006057324
## factor(cyl)6             -3.775000   2.315925 -1.630018 0.1151545663620229670
## factor(cyl)8             -7.850000   1.957314 -4.010599 0.0004547582690011110
## factor(am)1               5.175000   2.052848  2.520888 0.0181760532676256310
## factor(cyl)6:factor(am)1 -3.733333   3.094784 -1.206331 0.2385525615801434851
## factor(cyl)8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310692573417492068

what is a sure fire way of identifying which coefficient estimates are for interaction terms? The obvious way is to grep() for the colon symbol in the term names. But let's assume for a second that's not possible because of something like:

Click to copy

mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl)))
mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars)
##                         Estimate Std. Error   t value              Pr(>|t|)
## (Intercept)            22.900000   1.750674 13.080673 0.0000000000006057324
## cyl2Cyl: 4             -3.775000   2.315925 -1.630018 0.1151545663620229670
## cyl2Cyl: 8             -7.850000   1.957314 -4.010599 0.0004547582690011110
## factor(am)1             5.175000   2.052848  2.520888 0.0181760532676256310
## cyl2Cyl: 4:factor(am)1 -3.733333   3.094784 -1.206331 0.2385525615801434851
## cyl2Cyl: 8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310692573417492068

I thought perhaps the terms() object would be useful but it isn't. I could also probably make some assumption about the ordering/numbering of terms to get the intended result:

Click to copy

coef(summary(mod2))[5:6,]
##                         Estimate Std. Error   t value  Pr(>|t|)
## cyl2Cyl: 4:factor(am)1 -3.733333   3.094784 -1.206331 0.2385526
## cyl2Cyl: 8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310693

but I don't know how to do that in a general way.

What can be done?

519

asked May 17 '18 14:05

Thomas

1 Answers

This seems a little convoluted, but could we just enumerate all the main effects and then take the set difference?

Click to copy

mod2 <- lm(mpg ~ cyl2 * factor(am) + wt * disp, data = mtcars)
variables <- labels(mod2)[attr(terms(mod2), "order") == 1]
factors <- sapply(names(mod2$xlevels), function(x) paste0(x, mod2$xlevels[[x]])[-1])
setdiff(colnames(model.matrix(mod2)), c("(Intercept)", variables, unlist(factors)))
# [1] "cyl2Cyl: 4:factor(am)1" "cyl2Cyl: 8:factor(am)1" "wt:disp"

answered Sep 20 '22 22:09

Weihuang Wong

Related questions
                            
                                Calculate, decode and plot routes on map using leaflet and R
                            
                                Reduce padding in ggplot2 legend
                            
                                R group by show count of all factor levels even when zero dplyr
                            
                                mapping values between data frames R
                            
                                Computation failed for stat_summary, 'what' must be a character string or a function
                            
                                Creating dynamic tabs in Rmarkdown
                            
                                How To Add totals to a DT::datatable?
                            
                                Calculating Population Standard Deviation in R
                            
                                ggplot add text inside each tile of geom tile
                            
                                How do I eliminate stubborn white space between fluidRows in Shiny?
                            
                                Change font family throughout entire R Shiny App: CSS/HTML
                            
                                transpose nested list
                            
                                Randomly remove duplicated rows using dplyr()
                            
                                Join vectors into dataframe by matching values
                            
                                R loop over two or more vectors simultaneously - paralell
                            
                                How to sum list elements with the same name?
                            
                                Get text from href tag after specific class
                            
                                cbind a dynamic column name from a string in R
                            
                                Weighted logistic regression in R
                            
                                How to add title to a networkD3 visualisation when saving as a web page?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract interaction terms from regression estimates

Tags:

r

regression

lm

Thomas

People also ask

1 Answers

Weihuang Wong

Recent Activity

Donate For Us