Function within Function in R

Tags:

r

Can you please explain to me why the code complains saying that Samdat is not found?

I am trying to switch between models, so I declared a function that contains these specific models and I just need to call this function as one of the argument in the get.f function where the resampling will change the structure for each design matrix in the model. The code complains that Samdat is not found when it is found.

Also, is there a way I can make the conditional statement if(Model == M1()) instead of having to create another argument M to set if(M==1)?

Here is my code:

dat <-  cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)

M1 <- function(){
    a1 = cbind(Samdat[,c(2:5,7,9)])
    b1 = cbind(Samdat[,c(2:4,6,8,7)])
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)}

M2 <- function(){
    a1= cbind(Samdat[,c(2:5,7,9)])+2
    b1= cbind(Samdat[,c(2:4,6,8,7)])+2
    c1 = a1+b1
    list(a1=a1,b1=b1,c1=c1)}

M3 <- function(){
    a1= cbind(Samdat[,c(2:5,7,9)])+8
    b1= cbind(Samdat[,c(2:4,6,8,7)])+8
    c1 = a1+b1
    list(a1=a1,b1=b1,c1=c1)}
#################################################################
get.f <- function(asim,Model,M){
    sse <-c()
    for(i in 1:asim){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        if(M==1){
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- a2+b2+c2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <-Model$cof
            sse <-c(sse,coff)
        }
        else if(M==2){
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- c2+12
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <-Model$cof
            sse <-c(sse,coff)
        }
        else {
            a2 <- Model$a1
            b2 <- Model$b1
            c2 <- Model$c1
            s<- c2+a2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            coff <- Model$cof
            sse <-c(sse,coff)
        }
    }
    return(sse)
}

get.f(10,Model=M1(),M=1)
get.f(10,Model=M2(),M=2)
get.f(10,Model=M3(),M=3)

538

asked Sep 01 '12 12:09

Falcon-StatGuy

2 Answers

You might want to have a look at the R scoping rules. In particular, there's no reason to expect that variables you define in a function are visible in other functions.

You might be confused because the global environment (i.e. the top-level outside all functions) is an exception from this rule. I'm not going to go into your other questions, but let me note that the entire script looks very messed up to me - i.e. M1 to M3 are essentially one function, and the wad of copy/paste in get.f is definitely terrible. Whatever it is that you're trying to do can definitely be written in a less convoluted way.

Let's have a look at the Ms first - why not one function with a parameter? Including the solution to your scope problem, that makes two parameters -

M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

If you insist on defining aliases, you can also do something like

M1 <- function(sampleData) M(sampleData, 0) 
M2 <- function(sampleData) M(sampleData, 2) 
M3 <- function(sampleData) M(sampleData, 8)

This is already less repetitive, but ideally you want the computer to do the repetition for you (DRY!):

offsets <- c(0,2,8)
Models <- sapply(offsets, FUN=function(offset) function(sampleData) M(sampleData, offset))

Looking at get.f, it's not quite clear what you're trying to do - you're trying to fit something and collect something from the results, but the part about Model$cof refers to an undefined variable (your Model just has a1,b1 and c1 entries). Assuming you want to actually collect cof and discarding the interim code, get.f probably looks like this:

M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

get.f <- function(asim,Model,M){
    sse <-c()
    for(i in 1:asim){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        model <- Model()
        if(M==1){
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- a2+b2+c2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
        else if(M==2){
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- c2+12
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
        else {
            a2 <- model$a1
            b2 <- model$b1
            c2 <- model$c1
            s<- c2+a2
            fit <- lm(Y~s)
            cof <- sum(summary(fit)$coef[,1])
            sse <-c(sse,cof)
        }
    }
    return(sse)
}


get.f(10,Model=M1,M=1) 
get.f(10,Model=M2,M=2)
get.f(10,Model=M3,M=3)

That's still terribly repetitive, so why don't we think about it for a minute? All you're doing with your samples is to calculate one column from them to use in your fit. I don't see why you need to do the calculation in an M function and then do the extraction of the single value in get.f (dependent on which particular M you were using) - this seems indicative that the extraction should much rather be part of M... but if you need to keep them separate, okay, let's use separate extraction functions. Still comes in under half of your code size in reasonably-written R:

# Set up test data
dat <-  cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)

# calculate a1..c1 from a sample
M <- function(sampleData, offset) { 
    a1 = sampleData[,c(2:5,7,9)] + offset
    b1 = sampleData[,c(2:4,6,8,7)] + offset
    c1 = b1+a1
    list(a1=a1,b1=b1,c1=c1)
}

# create a fixed-offset model from the variable offset model by fixing offset
makeModel <- function(offset) function(sampleData) M(sampleData, offset)   

# run model against asim subsamples of data and collect coefficients
get.f <- function(asim,model,expected) 
    sapply(1:asim,  function (i){
        set.seed(i)
        Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
        Y <- Samdat[,1]
        s <- expected(model(Samdat))
        fit <- lm(Y~s)
        sum(summary(fit)$coef[,1])
    })

# list of models to run and how to extract the expectation values from the model reslts
todo <- list(
        list(model=makeModel(0), expected=function(data) data$a1+data$b1+data$c1),
        list(model=makeModel(2), expected=function(data) data$c1+12),
        list(model=makeModel(8), expected=function(data) data$c1+data$a1))

sapply(todo, function(l) { get.f(10, l$model, l$expected)})

answered Oct 21 '22 12:10

themel

When you call

get.f(10, Model=M1(), M=1)

your M1 function is immediately called. It dies because inside the body of M1 you are using Samdat which is only defined later, in the body of get.f.

Somehow, you need to call M1 after Samdat is defined. One way of doing that is to make M1 (the function) an argument to get.f and call the function from inside get.f:

get.f <- function(asim, Model.fun, M) {
   ...
   Sambat <- ...
   Model  <- Model.fun()
   ...
}
get.f(10, Model.fun = M1, M=1)

Also, in general, it is bad programming to use global variables, i.e., make your function use variables that are defined outside their scope. Instead, it is recommended that everything a function uses be passed as input arguments. You have two such cases in your code: M1 (M2, and M3) use Samdat and get.f uses dat. They should be arguments to your functions. Here is a nicer version of your code. I have not fixed everything, so you'll have to do a little more to get it to work:

M1 <- function(sampled.data) {
   a1 <- sampled.data[, c("v1", "v2", "v3", "v4", "v6", "v8")]
   b1 <- sampled.data[, c("v1", "v2", "v3", "v5", "v7", "v6")]
   c1 <- a1 + b1
   list(a1 = a1, b1 = b1, c1 = c1)
}

get.f <- function(dat, asim, Model.fun, offset, M) {
   sse <- c()
   for(i in 1:asim){
      set.seed(i)
      Samdat <- dat[sample(1:nrow(dat), nrow(dat), replace = TRUE), ]
      Y      <- Samdat[, "Y"]
      Model  <- Model.fun(sampled.data = Samdat)
      a2     <- Model$a1
      b2     <- Model$b1
      c2     <- Model$c1      
      s      <- switch(M, a2 + b2 + c2, c2 + 12, c2 + a2)
      fit    <- lm(Y ~ s)
      cof    <- sum(summary(fit)$coef[,1])
      coff   <- Model$cof        # there is a problem here...
      sse    <- c(sse, coff)     # this is not efficient
   }
   return(sse)
}

dat <- cbind(Y = rnorm(20), v1 = rnorm(20), v2 = runif(20), v3 = rexp(20),
                            v4 = rnorm(20), v5 = runif(20), v6 = rexp(20),
                            v7 = rnorm(20), v8 = runif(20), v9 = rexp(20))

get.f(dat, 10, Model.fun = M1, M = 1)

One last thing that jumps out: if the definition of s (what I gathered under switch() is related to the Model you use, then consider merging the definitions of Model and s together: add s to the list output of your M1, M2, M3 functions so that s can just be defined as s <- Model$s, and you can then drop the M input to get.f.

answered Oct 21 '22 11:10

flodel

Related questions
                            
                                How to extract outstanding values from an object returned by waldo::compare()?
                            
                                How to swap the names and values of a named vector in R?
                            
                                How to use replace the first occurrences of a string only if it appears more than once in R?
                            
                                Two equivalent commands give different returns in r
                            
                                What makes these two R data frames not identical?
                            
                                How to use ggplot to plot probability densities?
                            
                                Re-inserting NAs into a vector
                            
                                What is the reverse of scan() in R?
                            
                                Step by step procedure on how to run nested logistic regression in R
                            
                                How To Filter a Dataframe based on Category Counts
                            
                                Automatically determine position of plot legend
                            
                                Dealing with durations defined by days, hours, minutes and seconds such as "1d 3h 2m 28s" in R
                            
                                column names have periods inserted where there should be spaces
                            
                                R: cannot predict specific value [duplicate]
                            
                                Merging two data frames in R that have common and uncommon samples
                            
                                Download a file into my working directory
                            
                                find highest value within factor levels
                            
                                Reshape in the middle
                            
                                comparing values in a row
                            
                                Jensen Shannon divergence in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With