Call to weight in lm() within function doesn't evaluate properly

Tags:

non-standard-evaluation

I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:

wt_reg <- function(form, data, wts) {
  lm(formula = as.formula(form), data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

This returns

Error in eval(extras, data, env) : object 'wts' not found

If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame()). Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:

mf is assigned such that it's the same as

stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE)

When I run

parent.frame()$wts

it does return a numeric vector. But when I run

eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE), parent.frame())

it doesn't.

I can run

stats::model.frame(formula = as.formula(parent.frame()$form), 
    data = parent.frame()$data, weights = parent.frame()$wts, 
    drop.unused.levels = TRUE)

and it works. You can test this yourself if you want using the example from the top.

Any thoughts? I really have no idea what's going on here...

860

asked Apr 11 '20 22:04

be_green

1 Answers

Formulas as special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out

ff <- mpg ~ cyl
environment(ff)
# <environment: R_GlobalEnv>
foo <- function() {
  ff <- mpg ~ cyl
  environment(ff)
}
foo()
# <environment: 0x0000026172e505d8> private function environment (different each time)

The problem is that lm will try to use the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg, the formula holds on the the global scope. But wts only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work

wt_reg <- function(form, data, wts) {
  ff <- as.formula(form)
  environment(ff) <- environment()
  lm(formula = ff, data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

The eval(mf, parent.frame) you are referring to in lm() is calling model.frame() with your formula. And from the description on the ?model.frame help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.

118

answered Sep 29 '22 17:09

MrFlick

Related questions
                            
                                How to asynchronously query multiple databases in R
                            
                                Trying to use aggregate to run linear model on subset of values in a column
                            
                                Left align legend labels with ggplot
                            
                                R calling a dataset in the package itself
                            
                                Can I use R without R studio?
                            
                                R Markdown: Can't access Bash command installed through Conda/Anaconda
                            
                                How to fix "invalid return_url" error when creating oauth token for Trello with httr?
                            
                                finding a point on a sigmoidal curve in r
                            
                                Automatic rounding in dplyr::summarise() function [duplicate]
                            
                                ploting an ellipse in log plot with ggplot
                            
                                How to flatten non atomic function results so that can be assigned as part of a dplyr mutate step?
                            
                                R max function returns pseudo values when used within 'dplyr'
                            
                                How to rbind() / dplyr::bind_rows() / data.table::rbindlist() data frames which contain data frame columns?
                            
                                RStudio README.Rmd and README.md should be both staged use 'git commit --no-verify' to override this check
                            
                                R Draws Plots with Rectangles Instead of Text
                            
                                How to kill own Oracle SQL sessions without DBA privileges?
                            
                                Is there an expand.grid like function with matrix output
                            
                                How do you position the title and legend in tmap?
                            
                                ggiraph plot not appearing in shiny app, but works in RStudio
                            
                                Performing operations with lag on a dataframe to calculate a new value in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With