Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this simple function calling `lm(..., subset)` fail?

Tags:

function

r

I am working on a custom function that includes a call to lm(), but for some reason the function fails. I can't make any sense of why it fails.

Consider this example simplified to the bare-bones:

myfun <- function(form., data., subs., ...){
    lm(form., data., subs., ...)
}

This will end up in an error:

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'subs.' not found

However using lm() directly will work just fine:

lm(mpg ~ cyl + hp, mtcars, TRUE)
## 
## Call:
## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE)
## 
## Coefficients:
## (Intercept)          cyl           hp  
##    36.90833     -2.26469     -0.01912  

I tried debugging, but still can't get to the bottom of the problem. Why does the custom function fail? Clearly subs. has been supplied to the function...


Edit:

While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For instance expand.model.frame() relies on the formula's environment, but fails if I use the normal evaluation solution:

myfun <- function(form., data., subs., ...){
    fit <- lm(form., data.[ subs., ], ...)
    expand.model.frame(fit, ~ drat)
}

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'data.' not found

This is obviously related to the original issue, but I can't figure how. Is the environment of the model formula somehow corrupted?

like image 425
landroni Avatar asked May 19 '16 14:05

landroni


2 Answers

As suggested in the comments, another solution would be to avoid the subset argument altogether in non-interactive use, and use standard evaluation instead:

myfun <- function(form., data., subs., ...){
    lm(form., data.[ subs., ], ...)
}

Now this works as expected:

myfun(formula(mpg ~ cyl + hp), mtcars, TRUE)

However this won't still be enough if your custom function subsequently contains calls like expand.model.frame() or similar, which seem to be themselves sensitive to the non-standard evaluation of the subset argument. To make the function robust and avoid surprises, you need to both (1) define the formula within the custom function (see also the reformulate approach) and (2) subset the data prior to the lm() call while conspicuously avoiding the subset argument.

Like this:

myfun <- function(form., data., subs., ...){
    stopifnot(is.character(form.))
    data. <- data.[ subs., ]
    fit <- lm(as.formula(form.), data., ...)
    expand.model.frame(fit, ~ drat)
}

myfun("mpg ~ cyl + hp", mtcars, TRUE)

I tried using either (1) or (2), but still managed to run into strange errors from some functions, and it's only with both (1) and (2) that the errors seem to have gone away...

like image 53
landroni Avatar answered Sep 30 '22 11:09

landroni


The reason this function doesn't work is because of the way the argument subset is evaluated:

All of ‘weights’, ‘subset’ and ‘offset’ are evaluated in the same way as variables in ‘formula’, that is first in ‘data’ and then in the environment of ‘formula’.

In other words, lm looks for a variable named subs. in data and then in the environment of formula, and since there is no subs. variable in either of those environments it produces an error.

like image 26
Ernest A Avatar answered Sep 30 '22 10:09

Ernest A