Object not found error when passing model formula to another function

Tags:

formula

I have a weird problem with R that I can't seem to work out.

I've tried to write a function that performs K-fold cross validation for a model chosen by the stepwise procedure in R. (I'm aware of the issues with stepwise procedures, it's purely for comparison purposes) :)

Now the issue is, that if I define the function parameters (linmod,k,direction) and run the contents of the function, it works flawlessly. BUT, if I run it as a function, I get an error saying the datas.train object can't be found.

I've tried stepping through the function with debug() and the object clearly exists, but R says it doesn't when I actually run the function. If I just fit a model using lm() it works fine, so I believe it's a problem with the step function in the loop, while inside a function. (try commenting out the step command, and set the predictions to those from the ordinary linear model.)

#CREATE A LINEAR MODEL TO TEST FUNCTION
lm.cars <- lm(mpg~.,data=mtcars,x=TRUE,y=TRUE)


#THE FUNCTION
cv.step <- function(linmod,k=10,direction="both"){
  response <- linmod$y
  dmatrix <- linmod$x
  n <- length(response)
  datas <- linmod$model
  form <- formula(linmod$call)

  # generate indices for cross validation
  rar <- n/k
  xval.idx <- list()
  s <- sample(1:n, n) # permutation of 1:n
  for (i in 1:k) {
    xval.idx[[i]] <- s[(ceiling(rar*(i-1))+1):(ceiling(rar*i))]
  }

  #error calculation
  errors <- R2 <- 0

  for (j in 1:k){
     datas.test <- datas[xval.idx[[j]],]
       datas.train <- datas[-xval.idx[[j]],]
       test.idx <- xval.idx[[j]]

       #THE MODELS+
       lm.1 <- lm(form,data= datas.train)
       lm.step <- step(lm.1,direction=direction,trace=0)

      step.pred <- predict(lm.step,newdata= datas.test)
        step.error <- sum((step.pred-response[test.idx])^2)
        errors[j] <- step.error/length(response[test.idx])

        SS.tot <- sum((response[test.idx] - mean(response[test.idx]))^2)
        R2[j] <- 1 - step.error/SS.tot
  }

  CVerror <- sum(errors)/k
  CV.R2 <-  sum(R2)/k

  res <- list()
  res$CV.error <- CVerror
  res$CV.R2 <- CV.R2

return(res)
}


#TESTING OUT THE FUNCTION
cv.step(lm.cars)

Any thoughts?

841

asked Nov 21 '11 14:11

dcl

2 Answers

When you created your formula, lm.cars, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the formula function, the original environment of the model is included.

I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:

cv.step <- function(linmod,k=10,direction="both"){
  response <- linmod$y
  dmatrix <- linmod$x
  n <- length(response)
  datas <- linmod$model
  .env <- environment() ## identify the environment of cv.step

  ## extract the formula in the environment of cv.step
  form <- as.formula(linmod$call, env = .env) 

  ## The rest of your function follows

answered Oct 27 '22 10:10

Tyler

Another problem that can cause this is if one passes a character (string vector) to lm instead of a formula. vectors have no environment, and so when lm converts the character to a formula, it apparently also has no environment instead of being automatically assigned the local environment. If one then uses an object as weights that is not in the data argument data.frame, but is in the local function argument, one gets a not found error. This behavior is not very easy to understand. It is probably a bug.

Here's a minimal reproducible example. This function takes a data.frame, two variable names and a vector of weights to use.

residualizer = function(data, x, y, wtds) {
  #the formula to use
  f = "x ~ y" 

  #residualize
  resid(lm(formula = f, data = data, weights = wtds))
}

residualizer2 = function(data, x, y, wtds) {
  #the formula to use
  f = as.formula("x ~ y")

  #residualize
  resid(lm(formula = f, data = data, weights = wtds))
}

d_example = data.frame(x = rnorm(10), y = rnorm(10))
weightsvar = runif(10)

And test:

> residualizer(data = d_example, x = "x", y = "y", wtds = weightsvar)
Error in eval(expr, envir, enclos) : object 'wtds' not found

> residualizer2(data = d_example, x = "x", y = "y", wtds = weightsvar)
         1          2          3          4          5          6          7          8          9         10 
 0.8986584 -1.1218003  0.6215950 -0.1106144  0.1042559  0.9997725 -1.1634717  0.4540855 -0.4207622 -0.8774290

It is a very subtle bug. If one goes into the function environment with browser, one can see the weights vector just fine, but it somehow is not found in the lm call!

The bug becomes even harder to debug if one used the name weights for the weights variable. In this case, since lm can't find the weights object, it defaults to the function weights() from base thus throwing an even stranger error:

Error in model.frame.default(formula = f, data = data, weights = weights,  : 
  invalid type (closure) for variable '(weights)'

Don't ask me how many hours it took me to figure this out.

answered Oct 27 '22 12:10

CoderGuy123

Related questions
                            
                                Convert named vector to list in R
                            
                                R: cumulative sum over rolling date range
                            
                                invalid type (list) for variable
                            
                                Flatten nested lists in a list
                            
                                How to order data by value within ggplot facets
                            
                                ROracle package installation failure
                            
                                how to predict new cases using the neuralnet package
                            
                                multiple lines each based on a different dataframe in ggplot2 - automatic coloring and legend
                            
                                Plots with good resolution for printing and screen display
                            
                                Rotate histogram in R or overlay a density in a barplot
                            
                                R pass function in as variable
                            
                                Subset based on list of strings using grepl()?
                            
                                Count occurrences of factor in R, with zero counts reported
                            
                                Change column position of data.table
                            
                                Shifting non-NA cells to the left
                            
                                Error in XLConnect
                            
                                Using Prophet Package to Predict By Group in Dataframe in R
                            
                                Identifying positions of the last TRUEs in a sequence of TRUEs and FALSEs
                            
                                Understanding glm$residuals and resid(glm)
                            
                                How to create factors from factanal?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With