Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call to weight in lm() within function doesn't evaluate properly

I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:

wt_reg <- function(form, data, wts) {
  lm(formula = as.formula(form), data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

This returns

Error in eval(extras, data, env) : object 'wts' not found

If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame()). Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:

mf is assigned such that it's the same as

stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE)

When I run

parent.frame()$wts

it does return a numeric vector. But when I run

eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE), parent.frame()) 

it doesn't.

I can run

stats::model.frame(formula = as.formula(parent.frame()$form), 
    data = parent.frame()$data, weights = parent.frame()$wts, 
    drop.unused.levels = TRUE)

and it works. You can test this yourself if you want using the example from the top.

Any thoughts? I really have no idea what's going on here...

like image 860
be_green Avatar asked Apr 11 '20 22:04

be_green


People also ask

Should the weight argument to LM and GLM implement frequency weights?

Should the weight argument to lm and glm implement frequency weights, the results for wei_lm and wei_glm will be identical to that from ind_lm. Only the point estimates are correct, all the inference stats are not correct. The model using design with sampling weights svy_glm gives correct point estimates, but incorrect inference.

What is the difference between the weights function and LME function?

So, it seems to me that the weights function in lm gives observations more weight the larger the associated observation's 'weight' value, while the lme function in lme does precisely the opposite. This can be verified with a simple simulation.

How do I use the LM () function in R?

lm (formula, data, …) The following example shows how to use this function in R to do the following: The following code shows how to use the lm () function to fit a linear regression model in R: We can then use the summary () function to view the summary of the regression model fit:

What is LM used for in regression analysis?

Description lm is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these).


1 Answers

Formulas as special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out

ff <- mpg ~ cyl
environment(ff)
# <environment: R_GlobalEnv>
foo <- function() {
  ff <- mpg ~ cyl
  environment(ff)
}
foo()
# <environment: 0x0000026172e505d8> private function environment (different each time)

The problem is that lm will try to use the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg, the formula holds on the the global scope. But wts only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work

wt_reg <- function(form, data, wts) {
  ff <- as.formula(form)
  environment(ff) <- environment()
  lm(formula = ff, data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

The eval(mf, parent.frame) you are referring to in lm() is calling model.frame() with your formula. And from the description on the ?model.frame help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.

like image 118
MrFlick Avatar answered Sep 29 '22 17:09

MrFlick