I have many formulas (of class formula
or Formula
) of the form y ~ a*b
, where a
and b
are factors.
I need to write a function that takes such a formula and returns a formula with all of the terms in the interaction "spelled out." Here is an example:
fac1 <- factor(c('a', 'a', 'b', 'b'))
fac2 <- factor(c('c', 'd', 'c', 'd'))
BigFormula(formula(x ~ fac1*fac2))
where BigFormula
returns formula(x ~ a + b + c + d + a:c + a:d + b:c + b:d)
.
Is there a simple way to do this?
(The context: I am running many commands of the form anova(mod1, mod2)
, where mod2
nests in mod1
, and where the right-hand side of both models contains terms like fac1*fac2
. The point of these commands is to calculate F-statistics. The problem is that anova
treats fac1*fac2
as three variables, even though it usually represents more than three variables. (In the code above, for example, fac1*fac2
represents eight variables.) As a result, anova
underestimates the number of restrictions in the nested model, and it overestimates my degrees of freedom.)
I still have yet to learn all the tricks of formula, but if I want explicit formulas I'll tend to use sapply along with pasting:
# the factors
fac1 <- factor(c('a', 'a', 'b', 'b'))
fac2 <- factor(c('c', 'd', 'c', 'd'))
# create all the interaction terms
out <- sapply(levels(fac1), function(ii) {
sapply(levels(fac2), function(jj) {
paste0(ii,":",jj)
})
})
# along with the single terms
terms <- c(levels(fac1), levels(fac2), as.vector(out))
# and create the rhs of the formula
rhs <- paste0(terms, collapse=" + ")
# finally add the lhs
f <- paste0("x ~ ", rhs)
We end up with:
> f
[1] "x ~ a + b + c + d + a:c + a:d + b:c + b:d"
Look at the help for formula
there may be existing things that will work for you.
For example the formula y ~ (a + b + c + d)^2
will give you all main effects and all 2 way interactions and the formula y ~ (a + b) * (c + d)
gives the expansion that you show above.
You can also subtract terms so y ~ a*b*c - a:b:c
will not include the 3 way interaction.
How about the following solution. I use a more extreme example of a complex interaction.
f = formula(y ~ a * b * c * d * e)
To spell out the interaction terms, we extract the terms from the value returned by terms.formula():
terms = attr(terms.formula(f), "term.labels")
which yields:
> terms
[1] "a" "b" "c" "d" "e" "a:b" "a:c"
[8] "b:c" "a:d" "b:d" "c:d" "a:e" "b:e" "c:e"
[15] "d:e" "a:b:c" "a:b:d" "a:c:d" "b:c:d" "a:b:e" "a:c:e"
[22] "b:c:e" "a:d:e" "b:d:e" "c:d:e" "a:b:c:d" "a:b:c:e" "a:b:d:e"
[29] "a:c:d:e" "b:c:d:e" "a:b:c:d:e"
And then we can convert it back to a formula:
f = as.formula(sprintf("y ~ %s", paste(terms, collapse="+")))
> f
y ~ a + b + c + d + e + a:b + a:c + b:c + a:d + b:d + c:d + a:e +
b:e + c:e + d:e + a:b:c + a:b:d + a:c:d + b:c:d + a:b:e +
a:c:e + b:c:e + a:d:e + b:d:e + c:d:e + a:b:c:d + a:b:c:e +
a:b:d:e + a:c:d:e + b:c:d:e + a:b:c:d:e
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With