Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use a weights argument in a list of lm lapply calls [duplicate]

Tags:

r

lapply

lm

Here is my problem (fictional data in order to be reproducible) :

set.seed(42)
df<-data.frame("x"=rnorm(1000),"y"=rnorm(1000),"z"=rnorm(1000))
df2<-data.frame("x"=rnorm(100),"y"=rnorm(100),"z"=rnorm(100))
breaks<-c(-1000,-0.68,-0.01315,0.664,1000)
divider<-cut(df$x,breaks)
divider2<-cut(df2$x,breaks)
subDF<-by(df,INDICES=divider,data.frame)
subDF2<-by(df2,INDICES=divider2,data.frame)
reg<-lapply(subDF,lm,formula=x~.)
pre<-lapply(1:4,function(x){predict(reg[[x]],subDF2[[x]])})
lapply(1:4,function(x){summary(reg[[x]])$r.squared})

The above code works fine. What I am doing is the following : according to the values of x, I split dfin 4 dataframes and run a regression on each of those dataframes, in order to be able to predict values for an other dataset. The split of the dataframe is to allow a better prediction as the range of x has a great impact for the actual data.

What I am trying to do is to add a weights argument for the regression to give greater importance to the most recent data. My weights argument is : weights<-0.999^seq(250,1,by=-1)if there are 250 data. With a seed of 42 and the previous breaks, all of the 4 dimensions are 250.

When I try to do reg<-lapply(subDF,lm,formula=x~.,weights=0.999^seq(250,1,by=-1)), I got this error :

Error in eval(expr, envir, enclos) : 
  ..2 used in an incorrect context, no ... to look in

Which is quite strange as lapplyhas a ...argument, used here for the formula but it doesn't accept the weights.

So I really don't know what to do to add those weights. What should I correct in my code or should I (almost) entirely change it to be able to use the weights ?

For the example and in order to make it (perhaps) easier, I cut the breaks so that the 4 subsets have the same dimension but ideally the answer would work even if the 4 subsets are not of the same dimension (so with breaks of breaks<-c(-1000,-0.75,0,0.75,1000) for instance)

This post on CrossValidated has quite the same problem, but without a working solution so that didn't help me.

like image 274
etienne Avatar asked Nov 02 '15 14:11

etienne


2 Answers

Unfortunately, you have experienced first hand the, arguably, nastiest error in R. The so-called Non-standard Evaluation (NSE) error.

After a bit of digging in the code I think I have found the culprit. Let's take things one by one:

First of all let's have a look at the traceback():

weights <- 0.999^seq(250,1,by=-1)

lapply(subDF, lm, formula=x~., weights=weights)
Error in eval(expr, envir, enclos) : 
  ..2 used in an incorrect context, no ... to look in
> traceback()
8: eval(expr, envir, enclos)
7: eval(extras, data, env)
6: model.frame.default(formula = ..1, data = X[[1L]], weights = ..2, 
       drop.unused.levels = TRUE)
5: stats::model.frame(formula = ..1, data = X[[1L]], weights = ..2, 
       drop.unused.levels = TRUE)
4: eval(expr, envir, enclos)
3: eval(mf, parent.frame())
2: FUN(X[[1L]], ...)
1: lapply(subDF, lm, formula = x ~ ., weights = weights)

It looks like the problem occurs inside the model.frame.default. So, let's have a look in the source code. I will not post the whole source code but if you type model.frame.default in the console, you will see somewhere in the middle:

extras <- substitute(list(...))
extranames <- names(extras[-1L])
extras <- eval(extras, data, env)

The last line is where it fails. The first line is what is called NSE and is created by substitute. substitute will create what is called an expression i.e. let's say something like an object to be evaluated (i.e. created) later inside of eval. As you can see in eval, extras will be evaluated in data and then if not found in env. For the formula it is ok because it is evaluated in the data and x~. will tell eval to use all the columns in data. weights though is not in the data. Therefore, eval will look for it in env. But what is env?

Apparently, env is an environment and is created within model.frame.default in the line:

env <- environment(formula$terms)

So, what does this mean? Let's see another example:

xtest <- function(x) {
  new_func <- function(x) {
    env <- environment(x)
    print(env)
  }
  new_func(x)
} 

> xtest(x~z)
<environment: R_GlobalEnv>

In the function above I try to replicate in fewer lines what env will be in model.frame.default. As you can see, environment(formula) points to the global environment.

So, it is there where env tries to find ..2 i.e. the second argument passed in ... (i.e. weights), but as there is no ... in the global environment, you got an error. Hope it is clear now!

Best solution and what I would do is use @Heroka 's answer to solve it (or you could rewrite the whole model.frame.default and lm from scratch without using NSE but I think the first is more reasonable :) ).

like image 69
LyzandeR Avatar answered Oct 12 '22 17:10

LyzandeR


I don't know why you got the error you got (I thought the ....-argument was made for that. However, I found a slight workaround, is this in the direction of what you need? What I have done is created an 'anonymous' function inside lapply, which calculates the weights (dependent on dimension of data) and returns a model.

reg2 <- lapply(subDF, function(chunk){
  #calculate weights (!dependent on data ordering)
  weights <- 0.999^seq(nrow(chunk),1,by=-1)

  #fit model
  fit <- lm(x~., data=chunk, weights=weights)
  return(fit)
})
like image 3
Heroka Avatar answered Oct 12 '22 16:10

Heroka