How to use formula in R to exclude main effect but retain interaction

Tags:

I do not want main effect because it is collinear with a finer factor fixed effect, so it is annoying to have these NA.

In this example:

lm(y ~ x * z)

I want the interaction of x (numeric) and z (factor), but not the main effect of z.

964

asked Nov 21 '16 21:11

wolfsatthedoor

1 Answers

Introduction

R documentation of ?formula says:

The ‘*’ operator denotes factor crossing: ‘a * b’ interpreted as ‘a + b + a : b

So it sounds like that dropping main effect is straightforward, by just doing one of the following:

a + a:b  ## main effect on `b` is dropped
b + a:b  ## main effect on `a` is dropped
a:b      ## main effects on both `a` and `b` are dropped

Oh, really? No no no (too simple, too naive). In reality it depends on the variable class of a and b.

If none of them are factors, or only one one them is a factor, this is true;
If both of them are factors, no. You can never drop main effect (low-order effect) when interaction (high-order effect) is present.

This kind of behavior is due to a magic function called model.matrix.default, which constructs a design matrix from a formula. A numerical variable is just included as it is into a column, but a factor variable is automatically coded as many dummy columns. It is exactly this dummy recoding that is a magic. It is commonly believed that we can enable or disable contrasts to control it, but not really. We lose control of contrasts even in this simplest example. The problem is that model.matrix.default has its own rule when doing dummy encoding, and it is very sensitive to how you specify the model formula. It is exactly for this reason that we can't drop main effect when an interaction between two factors exists.

Interaction between a numeric and a factor

From your question, x is numeric and z is a factor. You can specify a model with interaction but not with main effect of z by

y ~ x + x:z

Since x is numeric, it is equivalent to do

y ~ x:z

The only difference here is parametrization (or how model.matrix.default does dummy encoding). Consider a small example:

set.seed(0)
y <- rnorm(10)
x <- rnorm(10)
z <- gl(2, 5, labels = letters[1:2])

fit1 <- lm(y ~ x + x:z)
#Coefficients:
#(Intercept)            x         x:zb  
#     0.1989      -0.1627      -0.5456  

fit2 <- lm(y ~ x:z)
#Coefficients:
#(Intercept)         x:za         x:zb  
#     0.1989      -0.1627      -0.7082

From the names of the coefficients we see that in the 1st specification, z is contrasted so its 1st level "a" is not dummy encoded, while in the 2nd specification, z is not contrasted and both levels "a" and "b" are dummy encoded. Given that both specifications ends up with three coefficients, they are really equivalent (mathematically speaking, the design matrix in two cases have the same column space) and you can verify this by comparing their fitted values:

all.equal(fit1$fitted, fit2$fitted)
# [1] TRUE

So why is z contrasted in the first case? Because otherwise we have two dummy columns for x:z, and the sum of these two columns are just x, aliased with the existing model term x in the formula. In fact, in this case even if you require that you don't want contrasts, model.matrix.default will not obey:

model.matrix.default(y ~ x + x:z,
      contrast.arg = list(z = contr.treatment(nlevels(z), contrasts = FALSE)))
#   (Intercept)          x       x:zb
#1            1  0.7635935  0.0000000
#2            1 -0.7990092  0.0000000
#3            1 -1.1476570  0.0000000
#4            1 -0.2894616  0.0000000
#5            1 -0.2992151  0.0000000
#6            1 -0.4115108 -0.4115108
#7            1  0.2522234  0.2522234
#8            1 -0.8919211 -0.8919211
#9            1  0.4356833  0.4356833
#10           1 -1.2375384 -1.2375384

So why in the 2nd case is z not contrasted? Because if it is, we loose the effect of level "a" when constructing interaction. And even if you require a contrast, model.matrix.default will just ignore you:

model.matrix.default(y ~ x:z,
      contrast.arg = list(z = contr.treatment(nlevels(z), contrasts = TRUE)))
#   (Intercept)       x:za       x:zb
#1            1  0.7635935  0.0000000
#2            1 -0.7990092  0.0000000
#3            1 -1.1476570  0.0000000
#4            1 -0.2894616  0.0000000
#5            1 -0.2992151  0.0000000
#6            1  0.0000000 -0.4115108
#7            1  0.0000000  0.2522234
#8            1  0.0000000 -0.8919211
#9            1  0.0000000  0.4356833
#10           1  0.0000000 -1.2375384

Oh, amazing model.matrix.default. It is able to make the right decision!

Interaction between two factors

Let me reiterate it: There is no way to drop main effect when interaction is present.

I will not provide extra example here, as I have one in Why do I get NA coefficients and how does lm drop reference level for interaction. See the "Contrasts for interaction" section over there. In short, all the following specifications give the same model (they have the same fitted values):

~ year:treatment
~ year:treatment + 0
~ year + year:treatment
~ treatment + year:treatment
~ year + treatment + year:treatment
~ year * treatment

And in particular, the 1st specification leads to an NA coefficient.

So once the RHS of ~ contains an year:treatment, you can never ask model.matrix.default to drop main effects.

People inexperienced with this behavior are to be surprised when producing ANOVA tables.

Bypassing `model.matrix.default`

Some people consider model.matrix.default annoying as it does not appear to have a consistent manner in dummy encoding. A "consistent manner" in their view is to always drop the 1st factor level. Well, no problem, you can bypass model.matrix.default by manually doing the dummy encoding, and feed the resulting dummy matrix as a variable to lm, etc.

However, you still need model.matrix.default's help to easily do dummy encoding for a (yes, only one) factor variable. For example, for the variable z in our previous example, its full dummy encoding is the following, and you can retain all or some of its columns for regression.

Z <- model.matrix.default(~ z + 0)  ## no contrasts (as there is no intercept)
#   za zb
#1   1  0
#2   1  0
#3   1  0
#4   1  0
#5   1  0
#6   0  1
#7   0  1
#8   0  1
#9   0  1
#10  0  1
#attr(,"assign")
#[1] 1 1
#attr(,"contrasts")
#attr(,"contrasts")$z
#[1] "contr.treatment"

Back to our simple example, if we don't want contrasts for z in y ~ x + x:z, we can do

Z2 <- Z[, 1:2]  ## use "[" to remove attributes of `Z`
lm(y ~ x + x:Z2)
#Coefficients:
#(Intercept)            x       x:Z2za       x:Z2zb  
#     0.1989      -0.7082       0.5456           NA

Not surprisingly we see an NA (because colSums(Z2) is aliased with x). And if we want to enforce contrasts in y ~ x:z, we can do either of the following:

Z1 <- Z[, 1]
lm(y ~ x:Z1)
#Coefficients:
#(Intercept)         x:Z1  
#    0.34728     -0.06571

Z1 <- Z[, 2]
lm(y ~ x:Z1)
#Coefficients:
#(Intercept)         x:Z1  
#     0.2318      -0.6860

And the latter case is probably what contefranz is trying to do.

However, I do not really recommend this kind of hacking. When you pass a model formula to lm, etc, model.matrix.default is trying to give you the most sensible construction. Also, in reality we want to do prediction with a fitted model. If you have done dummy encoding yourself, you would have a hard time when providing newdata to predict.

answered Oct 27 '22 01:10

Zheyuan Li

Related questions
                            
                                apply, sapply and lappy return NULL
                            
                                get connected components using igraph in R
                            
                                What is "{" class in R?
                            
                                How to select all columns in dplyr/sql?
                            
                                Show Git version in R Code
                            
                                plotting barchart in popup using leaflet library
                            
                                What does "Error in tools:::httpdPort <= 0L : ...." in Rstudio means?
                            
                                How can I set the latex path for sweave in R?
                            
                                R fast single item lookup from list vs data.table vs hash
                            
                                R pattern to modify arguments passed to generic function
                            
                                R: summarise multiple column (numeric, character) and remove NAs
                            
                                How to prevent rows with duplicated indices / keys to be appended to a data.frame?
                            
                                remove emoji from string in R
                            
                                Change axis breaks without defining sequence - ggplot
                            
                                Using an alternate compiler for Travis-CI R project builds
                            
                                ggplot2: have common facet bar in outer facet panel in 3-way plot
                            
                                Assign 1 in a matrix from a list of coordinates
                            
                                Undo git commit in Rstudio that is too big to push
                            
                                Extract sub-matrices from binary matrix in R
                            
                                count positive negative values in column by group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use formula in R to exclude main effect but retain interaction

Tags:

r

linear-regression

regression

lm

categorical-data

wolfsatthedoor

People also ask

1 Answers

Introduction

Interaction between a numeric and a factor

Interaction between two factors

Bypassing `model.matrix.default`

Zheyuan Li

Recent Activity

Donate For Us

How to use formula in R to exclude main effect but retain interaction

Tags:

r

linear-regression

regression

lm

categorical-data

wolfsatthedoor

People also ask

1 Answers

Introduction

Interaction between a numeric and a factor

Interaction between two factors

Bypassing model.matrix.default

Zheyuan Li

Related questions

Recent Activity

Donate For Us

Bypassing `model.matrix.default`