If I run the following code:
ff <- ~ a + b + b:a
dd <- data.frame(a = 1:3, b = 1:3)
model.matrix(terms(ff, keep.order = TRUE), dd) |> head()
I get
(Intercept) a b a:b
1 1 1 1 1
2 1 2 2 4
3 1 3 3 9
How can I ensure the terms in the model matrix are the same as in the formula? For some reason, b:a is swapped to a:b.
This only happens, if I have a as a first order effect, e.g., here everything is as expected:
ff <- ~ b + b:a
dd <- data.frame(a = 1:3, b = 1:3)
model.matrix(terms(ff, keep.order = TRUE), dd) |> head()
(Intercept) b b:a
1 1 1
1 2 4
1 3 9
The proximal reason that the interaction label comes out as a:b is that R constructs the terms based on the variable list, which is ordered according to first appearance in the formula. One possible workaround is to put the main effects terms in the order (b before a):
ff <- ~ b + a + b:a
dd <- data.frame(a = 1:3, b = 1:3)
terms(ff, keep.order = TRUE) |> attr("term.labels")
[1] "b" "a" "b:a"
As to why the term labels are constructed like this: unfortunately, this behaviour is embedded in the C code of termsform (see the source code, which is hard to read/interpret).
I messed around trying to debug my way through the C code: according to the comments in the code, the relevant bit that's constructing term labels is here. A mish-mosh of useful commands:
R -d gdb
run
Ctrl-C [break to set breakpoint]
break model.c:2050
cont
<run R code above>
n <next>
p Rf_PrintValue(x) [to view a SEXP object]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With