I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:
wt_reg <- function(form, data, wts) {
lm(formula = as.formula(form), data = data,
weights = wts)
}
wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
This returns
Error in eval(extras, data, env) : object 'wts' not found
If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame())
. Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:
mf is assigned such that it's the same as
stats::model.frame(formula = as.formula(form), data = data, weights = wts,
drop.unused.levels = TRUE)
When I run
parent.frame()$wts
it does return a numeric vector. But when I run
eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts,
drop.unused.levels = TRUE), parent.frame())
it doesn't.
I can run
stats::model.frame(formula = as.formula(parent.frame()$form),
data = parent.frame()$data, weights = parent.frame()$wts,
drop.unused.levels = TRUE)
and it works. You can test this yourself if you want using the example from the top.
Any thoughts? I really have no idea what's going on here...
Should the weight argument to lm and glm implement frequency weights, the results for wei_lm and wei_glm will be identical to that from ind_lm. Only the point estimates are correct, all the inference stats are not correct. The model using design with sampling weights svy_glm gives correct point estimates, but incorrect inference.
So, it seems to me that the weights function in lm gives observations more weight the larger the associated observation's 'weight' value, while the lme function in lme does precisely the opposite. This can be verified with a simple simulation.
lm (formula, data, …) The following example shows how to use this function in R to do the following: The following code shows how to use the lm () function to fit a linear regression model in R: We can then use the summary () function to view the summary of the regression model fit:
Description lm is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these).
Formulas as special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out
ff <- mpg ~ cyl
environment(ff)
# <environment: R_GlobalEnv>
foo <- function() {
ff <- mpg ~ cyl
environment(ff)
}
foo()
# <environment: 0x0000026172e505d8> private function environment (different each time)
The problem is that lm
will try to use the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg
, the formula holds on the the global scope. But wts
only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work
wt_reg <- function(form, data, wts) {
ff <- as.formula(form)
environment(ff) <- environment()
lm(formula = ff, data = data,
weights = wts)
}
wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
The eval(mf, parent.frame)
you are referring to in lm()
is calling model.frame()
with your formula. And from the description on the ?model.frame
help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With