Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function within Function in R

Tags:

r

Can you please explain to me why the code complains saying that Samdat is not found?

I am trying to switch between models, so I declared a function that contains these specific models and I just need to call this function as one of the argument in the get.f function where the resampling will change the structure for each design matrix in the model. The code complains that Samdat is not found when it is found.

Also, is there a way I can make the conditional statement if(Model == M1()) instead of having to create another argument M to set if(M==1)?

Here is my code:

dat <-  cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)

M1 <- function(){
    a1 = cbind(Samdat[,c(2:5,7,9)])
    b1 = cbind(Samdat[,c(2:4,6,8,7)])
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)}

M2 <- function(){
    a1= cbind(Samdat[,c(2:5,7,9)])+2
    b1= cbind(Samdat[,c(2:4,6,8,7)])+2
    c1 = a1+b1
    list(a1=a1,b1=b1,c1=c1)}

M3 <- function(){
    a1= cbind(Samdat[,c(2:5,7,9)])+8
    b1= cbind(Samdat[,c(2:4,6,8,7)])+8
    c1 = a1+b1
    list(a1=a1,b1=b1,c1=c1)}
#################################################################
get.f <- function(asim,Model,M){
    sse <-c()
    for(i in 1:asim){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        if(M==1){
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- a2+b2+c2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <-Model$cof
            sse <-c(sse,coff)
        }
        else if(M==2){
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- c2+12
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <-Model$cof
            sse <-c(sse,coff)
        }
        else {
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- c2+a2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <- Model$cof
            sse <-c(sse,coff)
        }
    }
    return(sse)
}

get.f(10,Model=M1(),M=1)
get.f(10,Model=M2(),M=2)
get.f(10,Model=M3(),M=3)
like image 538
Falcon-StatGuy Avatar asked Sep 01 '12 12:09

Falcon-StatGuy


People also ask

Can I have a function within a function in R?

A nested function or the enclosing function is a function that is defined within another function. In simpler words, a nested function is a function in another function. There are two ways to create a nested function in the R programming language: Calling a function within another function we created.

What does the Within function do in R?

R within() function calculates the outcome of the expression within itself but with a slight difference. It allows us to create a copy of the data frame and add a column that would eventually store the result of the R expression.

Can functions be nested in R?

As you become more comfortable with R, you will find that it is more efficient to code using nested functions, or functions within other functions, which will allow you to execute multiple commands at the same time.

Can a function be within a function?

A nested function is a function that is completely contained within a parent function. Any function in a program file can include a nested function. The primary difference between nested functions and other types of functions is that they can access and modify variables that are defined in their parent functions.


2 Answers

You might want to have a look at the R scoping rules. In particular, there's no reason to expect that variables you define in a function are visible in other functions.

You might be confused because the global environment (i.e. the top-level outside all functions) is an exception from this rule. I'm not going to go into your other questions, but let me note that the entire script looks very messed up to me - i.e. M1 to M3 are essentially one function, and the wad of copy/paste in get.f is definitely terrible. Whatever it is that you're trying to do can definitely be written in a less convoluted way.

Let's have a look at the Ms first - why not one function with a parameter? Including the solution to your scope problem, that makes two parameters -

M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

If you insist on defining aliases, you can also do something like

M1 <- function(sampleData) M(sampleData, 0) 
M2 <- function(sampleData) M(sampleData, 2) 
M3 <- function(sampleData) M(sampleData, 8) 

This is already less repetitive, but ideally you want the computer to do the repetition for you (DRY!):

offsets <- c(0,2,8)
Models <- sapply(offsets, FUN=function(offset) function(sampleData) M(sampleData, offset))

Looking at get.f, it's not quite clear what you're trying to do - you're trying to fit something and collect something from the results, but the part about Model$cof refers to an undefined variable (your Model just has a1,b1 and c1 entries). Assuming you want to actually collect cof and discarding the interim code, get.f probably looks like this:

M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

get.f <- function(asim,Model,M){
    sse <-c()
    for(i in 1:asim){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        model <- Model()
        if(M==1){
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- a2+b2+c2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
        else if(M==2){
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- c2+12
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
        else {
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- c2+a2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
    }
    return(sse)
}


get.f(10,Model=M1,M=1) 
get.f(10,Model=M2,M=2)
get.f(10,Model=M3,M=3)

That's still terribly repetitive, so why don't we think about it for a minute? All you're doing with your samples is to calculate one column from them to use in your fit. I don't see why you need to do the calculation in an M function and then do the extraction of the single value in get.f (dependent on which particular M you were using) - this seems indicative that the extraction should much rather be part of M... but if you need to keep them separate, okay, let's use separate extraction functions. Still comes in under half of your code size in reasonably-written R:

# Set up test data
dat <-  cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)

# calculate a1..c1 from a sample
M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

# create a fixed-offset model from the variable offset model by fixing offset
makeModel <- function(offset) function(sampleData) M(sampleData, offset)   

# run model against asim subsamples of data and collect coefficients
get.f <- function(asim,model,expected) 
    sapply(1:asim,  function (i){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        s <- expected(model(Samdat))
        fit <- lm(Y~s)
        sum(summary(fit)$coef[,1])
    })

# list of models to run and how to extract the expectation values from the model reslts
todo <- list(
        list(model=makeModel(0), expected=function(data) data$a1+data$b1+data$c1),
        list(model=makeModel(2), expected=function(data) data$c1+12),
        list(model=makeModel(8), expected=function(data) data$c1+data$a1))

sapply(todo, function(l) { get.f(10, l$model, l$expected)})
like image 69
themel Avatar answered Oct 21 '22 12:10

themel


When you call

get.f(10, Model=M1(), M=1)

your M1 function is immediately called. It dies because inside the body of M1 you are using Samdat which is only defined later, in the body of get.f.

Somehow, you need to call M1 after Samdat is defined. One way of doing that is to make M1 (the function) an argument to get.f and call the function from inside get.f:

get.f <- function(asim, Model.fun, M) {
   ...
   Sambat <- ...
   Model  <- Model.fun()
   ...
}
get.f(10, Model.fun = M1, M=1)

Also, in general, it is bad programming to use global variables, i.e., make your function use variables that are defined outside their scope. Instead, it is recommended that everything a function uses be passed as input arguments. You have two such cases in your code: M1 (M2, and M3) use Samdat and get.f uses dat. They should be arguments to your functions. Here is a nicer version of your code. I have not fixed everything, so you'll have to do a little more to get it to work:

M1 <- function(sampled.data) {
   a1 <- sampled.data[, c("v1", "v2", "v3", "v4", "v6", "v8")]
   b1 <- sampled.data[, c("v1", "v2", "v3", "v5", "v7", "v6")]
   c1 <- a1 + b1
   list(a1 = a1, b1 = b1, c1 = c1)
}

get.f <- function(dat, asim, Model.fun, offset, M) {
   sse <- c()
   for(i in 1:asim){
      set.seed(i)
      Samdat <- dat[sample(1:nrow(dat), nrow(dat), replace = TRUE), ]
      Y      <- Samdat[, "Y"]
      Model  <- Model.fun(sampled.data = Samdat)
      a2     <- Model$a1
      b2     <- Model$b1
      c2     <- Model$c1      
      s      <- switch(M, a2 + b2 + c2, c2 + 12, c2 + a2)
      fit    <- lm(Y ~ s)
      cof    <- sum(summary(fit)$coef[,1])
      coff   <- Model$cof        # there is a problem here...
      sse    <- c(sse, coff)     # this is not efficient
   }
   return(sse)
}

dat <- cbind(Y = rnorm(20), v1 = rnorm(20), v2 = runif(20), v3 = rexp(20),
                            v4 = rnorm(20), v5 = runif(20), v6 = rexp(20),
                            v7 = rnorm(20), v8 = runif(20), v9 = rexp(20))

get.f(dat, 10, Model.fun = M1, M = 1)

One last thing that jumps out: if the definition of s (what I gathered under switch() is related to the Model you use, then consider merging the definitions of Model and s together: add s to the list output of your M1, M2, M3 functions so that s can just be defined as s <- Model$s, and you can then drop the M input to get.f.

like image 6
flodel Avatar answered Oct 21 '22 11:10

flodel