Can you please explain to me why the code complains saying that Samdat
is not found?
I am trying to switch between models, so I declared a function that contains these specific models and I just need to call this function as one of the argument in the get.f
function where the resampling will change the structure for each design matrix in the model. The code complains that Samdat
is not found when it is found.
Also, is there a way I can make the conditional statement if(Model == M1())
instead of having to create another argument M
to set if(M==1)
?
Here is my code:
dat <- cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)
M1 <- function(){
a1 = cbind(Samdat[,c(2:5,7,9)])
b1 = cbind(Samdat[,c(2:4,6,8,7)])
c1 = b1+a1
list(a1=a1,b1=b1,c1=c1)}
M2 <- function(){
a1= cbind(Samdat[,c(2:5,7,9)])+2
b1= cbind(Samdat[,c(2:4,6,8,7)])+2
c1 = a1+b1
list(a1=a1,b1=b1,c1=c1)}
M3 <- function(){
a1= cbind(Samdat[,c(2:5,7,9)])+8
b1= cbind(Samdat[,c(2:4,6,8,7)])+8
c1 = a1+b1
list(a1=a1,b1=b1,c1=c1)}
#################################################################
get.f <- function(asim,Model,M){
sse <-c()
for(i in 1:asim){
set.seed(i)
Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
Y <- Samdat[,1]
if(M==1){
a2 <- Model$a1
b2 <- Model$b1
c2 <- Model$c1
s<- a2+b2+c2
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
coff <-Model$cof
sse <-c(sse,coff)
}
else if(M==2){
a2 <- Model$a1
b2 <- Model$b1
c2 <- Model$c1
s<- c2+12
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
coff <-Model$cof
sse <-c(sse,coff)
}
else {
a2 <- Model$a1
b2 <- Model$b1
c2 <- Model$c1
s<- c2+a2
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
coff <- Model$cof
sse <-c(sse,coff)
}
}
return(sse)
}
get.f(10,Model=M1(),M=1)
get.f(10,Model=M2(),M=2)
get.f(10,Model=M3(),M=3)
A nested function or the enclosing function is a function that is defined within another function. In simpler words, a nested function is a function in another function. There are two ways to create a nested function in the R programming language: Calling a function within another function we created.
R within() function calculates the outcome of the expression within itself but with a slight difference. It allows us to create a copy of the data frame and add a column that would eventually store the result of the R expression.
As you become more comfortable with R, you will find that it is more efficient to code using nested functions, or functions within other functions, which will allow you to execute multiple commands at the same time.
A nested function is a function that is completely contained within a parent function. Any function in a program file can include a nested function. The primary difference between nested functions and other types of functions is that they can access and modify variables that are defined in their parent functions.
You might want to have a look at the R scoping rules. In particular, there's no reason to expect that variables you define in a function are visible in other functions.
You might be confused because the global environment (i.e. the top-level outside all functions) is an exception from this rule. I'm not going to go into your other questions, but let me note that the entire script looks very messed up to me - i.e. M1
to M3
are essentially one function, and the wad of copy/paste in get.f
is definitely terrible. Whatever it is that you're trying to do can definitely be written in a less convoluted way.
Let's have a look at the M
s first - why not one function with a parameter? Including the solution to your scope problem, that makes two parameters -
M <- function(sampleData, offset) {
a1 = sampleData[,c(2:5,7,9)] + offset
b1 = sampleData[,c(2:4,6,8,7)] + offset
c1 = b1+a1
list(a1=a1,b1=b1,c1=c1)
}
If you insist on defining aliases, you can also do something like
M1 <- function(sampleData) M(sampleData, 0)
M2 <- function(sampleData) M(sampleData, 2)
M3 <- function(sampleData) M(sampleData, 8)
This is already less repetitive, but ideally you want the computer to do the repetition for you (DRY!):
offsets <- c(0,2,8)
Models <- sapply(offsets, FUN=function(offset) function(sampleData) M(sampleData, offset))
Looking at get.f
, it's not quite clear what you're trying to do - you're trying to fit something and collect something from the results, but the part about Model$cof
refers to an undefined variable (your Model
just has a1
,b1
and c1
entries). Assuming you want to actually collect cof
and discarding the interim code, get.f
probably looks like this:
M <- function(sampleData, offset) {
a1 = sampleData[,c(2:5,7,9)] + offset
b1 = sampleData[,c(2:4,6,8,7)] + offset
c1 = b1+a1
list(a1=a1,b1=b1,c1=c1)
}
get.f <- function(asim,Model,M){
sse <-c()
for(i in 1:asim){
set.seed(i)
Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
Y <- Samdat[,1]
model <- Model()
if(M==1){
a2 <- model$a1
b2 <- model$b1
c2 <- model$c1
s<- a2+b2+c2
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
sse <-c(sse,cof)
}
else if(M==2){
a2 <- model$a1
b2 <- model$b1
c2 <- model$c1
s<- c2+12
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
sse <-c(sse,cof)
}
else {
a2 <- model$a1
b2 <- model$b1
c2 <- model$c1
s<- c2+a2
fit <- lm(Y~s)
cof <- sum(summary(fit)$coef[,1])
sse <-c(sse,cof)
}
}
return(sse)
}
get.f(10,Model=M1,M=1)
get.f(10,Model=M2,M=2)
get.f(10,Model=M3,M=3)
That's still terribly repetitive, so why don't we think about it for a minute? All you're doing with your samples is to calculate one column from them to use in your fit. I don't see why you need to do the calculation in an M
function and then do the extraction of the single value in get.f
(dependent on which particular M you were using) - this seems indicative that the extraction should much rather be part of M... but if you need to keep them separate, okay, let's use separate extraction functions. Still comes in under half of your code size in reasonably-written R:
# Set up test data
dat <- cbind(Y=rnorm(20),rnorm(20),runif(20),rexp(20),rnorm(20),runif(20), rexp(20),rnorm(20),runif(20),rexp(20))
nam <- paste("v",1:9,sep="")
colnames(dat) <- c("Y",nam)
# calculate a1..c1 from a sample
M <- function(sampleData, offset) {
a1 = sampleData[,c(2:5,7,9)] + offset
b1 = sampleData[,c(2:4,6,8,7)] + offset
c1 = b1+a1
list(a1=a1,b1=b1,c1=c1)
}
# create a fixed-offset model from the variable offset model by fixing offset
makeModel <- function(offset) function(sampleData) M(sampleData, offset)
# run model against asim subsamples of data and collect coefficients
get.f <- function(asim,model,expected)
sapply(1:asim, function (i){
set.seed(i)
Samdat <- dat[sample(1:nrow(dat),nrow(dat),replace=T),]
Y <- Samdat[,1]
s <- expected(model(Samdat))
fit <- lm(Y~s)
sum(summary(fit)$coef[,1])
})
# list of models to run and how to extract the expectation values from the model reslts
todo <- list(
list(model=makeModel(0), expected=function(data) data$a1+data$b1+data$c1),
list(model=makeModel(2), expected=function(data) data$c1+12),
list(model=makeModel(8), expected=function(data) data$c1+data$a1))
sapply(todo, function(l) { get.f(10, l$model, l$expected)})
When you call
get.f(10, Model=M1(), M=1)
your M1
function is immediately called. It dies because inside the body of M1
you are using Samdat
which is only defined later, in the body of get.f
.
Somehow, you need to call M1
after Samdat
is defined. One way of doing that is to make M1
(the function) an argument to get.f
and call the function from inside get.f
:
get.f <- function(asim, Model.fun, M) {
...
Sambat <- ...
Model <- Model.fun()
...
}
get.f(10, Model.fun = M1, M=1)
Also, in general, it is bad programming to use global variables, i.e., make your function use variables that are defined outside their scope. Instead, it is recommended that everything a function uses be passed as input arguments. You have two such cases in your code: M1
(M2
, and M3
) use Samdat
and get.f
uses dat
. They should be arguments to your functions. Here is a nicer version of your code. I have not fixed everything, so you'll have to do a little more to get it to work:
M1 <- function(sampled.data) {
a1 <- sampled.data[, c("v1", "v2", "v3", "v4", "v6", "v8")]
b1 <- sampled.data[, c("v1", "v2", "v3", "v5", "v7", "v6")]
c1 <- a1 + b1
list(a1 = a1, b1 = b1, c1 = c1)
}
get.f <- function(dat, asim, Model.fun, offset, M) {
sse <- c()
for(i in 1:asim){
set.seed(i)
Samdat <- dat[sample(1:nrow(dat), nrow(dat), replace = TRUE), ]
Y <- Samdat[, "Y"]
Model <- Model.fun(sampled.data = Samdat)
a2 <- Model$a1
b2 <- Model$b1
c2 <- Model$c1
s <- switch(M, a2 + b2 + c2, c2 + 12, c2 + a2)
fit <- lm(Y ~ s)
cof <- sum(summary(fit)$coef[,1])
coff <- Model$cof # there is a problem here...
sse <- c(sse, coff) # this is not efficient
}
return(sse)
}
dat <- cbind(Y = rnorm(20), v1 = rnorm(20), v2 = runif(20), v3 = rexp(20),
v4 = rnorm(20), v5 = runif(20), v6 = rexp(20),
v7 = rnorm(20), v8 = runif(20), v9 = rexp(20))
get.f(dat, 10, Model.fun = M1, M = 1)
One last thing that jumps out: if the definition of s
(what I gathered under switch()
is related to the Model
you use, then consider merging the definitions of Model
and s
together: add s
to the list output of your M1
, M2
, M3
functions so that s
can just be defined as s <- Model$s
, and you can then drop the M
input to get.f
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With