How can I parse an R formula object (fo
) correctly (i.e., not turning it into a string when parsing)?
Let's say I have the following:
## Creating a formula object
fo <- y ~ x1 + x2 | 0 + z1 + z2 + z3 + z4 + z5
class(fo)
##[1] "formula"
typeof(fo)
##[1] "language"
strsplit(fo, split='|', fixed=TRUE)
##Error in strsplit(fo, split = "|", fixed = TRUE) : non-character argument
Hopefully, I want to parse it into three atomic vectors:
c("y")
.c("x1", "x2")
.c("z1", "z2", "z3", "z4", "z5")
(excluding the 0
).The tree structure of the formula breaks down as follows:
~
( y, response)Internally, this is a list-like object where the first element is the operator (~
), the second element is the first argument, and the third element is the third argument.
So deparse(fo[[2]])
gets you "y"
|
(x1+x2, 0 + ...). Same general structure (first element is the operator |
, second element is the first arg, third element is the second arg)so fo[[c(3,2)]]
gets x1+x2
.
all.vars(fo[[c(3,2)]])
gets the variables to the left of the bar
all.vars(fo[[c(3,3)]])
gets the variables to the right of the bar
This gets considerably trickier if you want to extract terms rather than variables; for example all.vars(quote(log(x))
gets "x", not "log(x)"
Possibly useful, you can also use lobster::ast()
to display the abstract syntax tree (AST):
> lobstr::ast(y ~ x1 + x2 | 0 + z1 + z2 + z3 + z4 + z5)
█─`~`
├─y
└─█─`|`
├─█─`+`
│ ├─x1
│ └─x2
└─█─`+`
├─█─`+`
│ ├─█─`+`
│ │ ├─█─`+`
│ │ │ ├─█─`+`
│ │ │ │ ├─0
│ │ │ │ └─z1
│ │ │ └─z2
│ │ └─z3
│ └─z4
└─z5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With