I am working on a custom function that includes a call to lm()
, but for some reason the function fails. I can't make any sense of why it fails.
Consider this example simplified to the bare-bones:
myfun <- function(form., data., subs., ...){
lm(form., data., subs., ...)
}
This will end up in an error:
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'subs.' not found
However using lm()
directly will work just fine:
lm(mpg ~ cyl + hp, mtcars, TRUE)
##
## Call:
## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE)
##
## Coefficients:
## (Intercept) cyl hp
## 36.90833 -2.26469 -0.01912
I tried debugging, but still can't get to the bottom of the problem. Why does the custom function fail? Clearly subs.
has been supplied to the function...
While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For instance expand.model.frame()
relies on the formula's environment, but fails if I use the normal evaluation solution:
myfun <- function(form., data., subs., ...){
fit <- lm(form., data.[ subs., ], ...)
expand.model.frame(fit, ~ drat)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'data.' not found
This is obviously related to the original issue, but I can't figure how. Is the environment of the model formula somehow corrupted?
As suggested in the comments, another solution would be to avoid the subset
argument altogether in non-interactive use, and use standard evaluation instead:
myfun <- function(form., data., subs., ...){
lm(form., data.[ subs., ], ...)
}
Now this works as expected:
myfun(formula(mpg ~ cyl + hp), mtcars, TRUE)
However this won't still be enough if your custom function subsequently contains calls like expand.model.frame()
or similar, which seem to be themselves sensitive to the non-standard evaluation of the subset
argument. To make the function robust and avoid surprises, you need to both (1) define the formula within the custom function (see also the reformulate
approach) and (2) subset the data prior to the lm()
call while conspicuously avoiding the subset
argument.
Like this:
myfun <- function(form., data., subs., ...){
stopifnot(is.character(form.))
data. <- data.[ subs., ]
fit <- lm(as.formula(form.), data., ...)
expand.model.frame(fit, ~ drat)
}
myfun("mpg ~ cyl + hp", mtcars, TRUE)
I tried using either (1) or (2), but still managed to run into strange errors from some functions, and it's only with both (1) and (2) that the errors seem to have gone away...
The reason this function doesn't work is because of the way the argument subset
is evaluated:
All of ‘weights’, ‘subset’ and ‘offset’ are evaluated in the same way as variables in ‘formula’, that is first in ‘data’ and then in the environment of ‘formula’.
In other words, lm
looks for a variable named subs.
in data
and then in the environment of formula
, and since there is no subs.
variable in either of those environments it produces an error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With