I'm writing my first S3 class and associated methods and I would like to know how to subset my input data set in order to keep only the variables specified in the formula?
data(iris)
f <- Species~Petal.Length + Petal.Width
With model.frame(f,iris)
I get a subset with all the variables in the formula. How to automatically keep only the right hand side variables (in the example Petal.Length
and Petal.Width
)?
You want labels
and terms
; see ?labels
, ?terms
, and ?terms.object
.
labels(terms(f))
# [1] "Petal.Length" "Petal.Width"
In particular, labels.terms
returns the "term.labels"
attribute of a terms
object, which excludes the LHS variable.
If you have a function in your formula, e.g., log
, and want to subset the data frame based on the variables, you can use get_all_vars
. This will ignore the function and extract the untransformed variables:
f2 <- Species ~ log(Petal.Length) + Petal.Width
get_all_vars(f2[-2], iris)
Petal.Length Petal.Width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
...
If you just want the variable names, all.vars
is a very helpful function:
all.vars(f2[-2])
[1] "Petal.Length" "Petal.Width"
The [-2]
is used to exclude the left hand side.
One way is to use subsetting to remove the LHS from the formula. Then you can use model.frame
on this:
f[-2]
~Petal.Length + Petal.Width
model.frame(f[-2],iris)
Petal.Length Petal.Width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
5 1.4 0.2
6 1.7 0.4
...
The package formula.tools has a number of functions to make life easier working with formulas. In your case:
> formula.tools::rhs.vars(f)
[1] "Petal.Length" "Petal.Width"
Relying on base R can be dangerous because the left hand side can be missing, meaning that element 1 no longer refers to that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With