Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correctly parse "formula" object in R

How can I parse an R formula object (fo) correctly (i.e., not turning it into a string when parsing)?

Let's say I have the following:

## Creating a formula object
fo <- y ~ x1 + x2 | 0 + z1 + z2 + z3 + z4 + z5

class(fo)
##[1] "formula"

typeof(fo)
##[1] "language"

strsplit(fo, split='|', fixed=TRUE)
##Error in strsplit(fo, split = "|", fixed = TRUE) : non-character argument

Hopefully, I want to parse it into three atomic vectors:

  1. Dependent variable: c("y").
  2. Regressors: c("x1", "x2").
  3. Others: c("z1", "z2", "z3", "z4", "z5") (excluding the 0).
like image 781
Álvaro A. Gutiérrez-Vargas Avatar asked Aug 31 '25 02:08

Álvaro A. Gutiérrez-Vargas


1 Answers

The tree structure of the formula breaks down as follows:

  • top level: ~( y, response)

Internally, this is a list-like object where the first element is the operator (~), the second element is the first argument, and the third element is the third argument.

So deparse(fo[[2]]) gets you "y"

  • next level (response): | (x1+x2, 0 + ...). Same general structure (first element is the operator |, second element is the first arg, third element is the second arg)

so fo[[c(3,2)]] gets x1+x2.

all.vars(fo[[c(3,2)]])

gets the variables to the left of the bar

all.vars(fo[[c(3,3)]])

gets the variables to the right of the bar

This gets considerably trickier if you want to extract terms rather than variables; for example all.vars(quote(log(x)) gets "x", not "log(x)"

Possibly useful, you can also use lobster::ast() to display the abstract syntax tree (AST):

> lobstr::ast(y ~ x1 + x2 | 0 + z1 + z2 + z3 + z4 + z5)
█─`~` 
├─y 
└─█─`|` 
  ├─█─`+` 
  │ ├─x1 
  │ └─x2 
  └─█─`+` 
    ├─█─`+` 
    │ ├─█─`+` 
    │ │ ├─█─`+` 
    │ │ │ ├─█─`+` 
    │ │ │ │ ├─0 
    │ │ │ │ └─z1 
    │ │ │ └─z2 
    │ │ └─z3 
    │ └─z4 
    └─z5 
like image 89
Ben Bolker Avatar answered Sep 02 '25 17:09

Ben Bolker