Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function accessing data from enclosing environment

Tags:

r

I am working with the segmented package and have encountered a problem when calling davies.test() from within a function.

Consider the following situation:

library(segmented)

data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data = data)
fit.seg = segmented(fit, seg.Z = ~ x)
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")

That works perfectly and indicates that the segmented regression has two statistically different slopes.

Now if I package all of that up into a function like this:

testit <- function() {
  data = data.frame(x = 1:21, y = c(10:1, 0:10))
  fit = lm(y ~ x, data)
  fit.seg = segmented(fit, seg.Z = ~ x)
  davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()

Then it works fine...

But if I delete fit from the global environment then it fails.

> rm(fit)
> testit()
 Error in eval(expr, envir, enclos) : object 'fit' not found

The problem seems to be with the way that davies.test is trying to access the data encapsulated in fit: it doesn't seem to look for fit in the enclosing scope (which in this case is the testit function), but skips directly to the global scope.

I'm sure that the problem relates to some subtlety with R's scoping rules. If I can find a quick fix that would prevent me from troubling the package author with this edge case, that would be great.

Thanks, Andrew.

like image 521
datawookie Avatar asked Jan 04 '16 12:01

datawookie


People also ask

What is an enclosing environment in R?

The enclosing environment is the environment where the function was created. Every function has one and only one enclosing environment. For the three other types of environment, there may be 0, 1, or many environments associated with each function: Binding a function to a name with <- defines a binding environment.

How do I access global environment in R?

Global environment can be referred to as . GlobalEnv in R codes as well. We can use the ls() function to show what variables and functions are defined in the current environment. Moreover, we can use the environment() function to get the current environment.

What is the parent environment of a function R?

The parent environment of a function is the environment in which the function was created. If a function was created in the execution environment (for example, in the global environment), then the environment in which the function was called will be the same as the environment in which the function was created.

What does ls () mean in R?

Overview. The ls() function in R is used to return a vector of character strings containing all the variables and functions that are defined in the current working directory in R programming. Variables whose names begin with a dot are, by default, not returned.


3 Answers

Try inserting the line marked ## below. There is still a difference that this does not account for as shown by the warning that appears when the modified testit is run but the output pvalue is the same so it may be sufficient for your needs. This is, of course, a bug in the package and best would really be to ask the maintainer of the package if they would fix it.

library(segmented)
testit <- function() {
  data = data.frame(x = 1:21, y = c(10:1, 0:10))
  fit = lm(y ~ x, data)
  fit.seg = segmented(fit, seg.Z = ~ x)
  environment(davies.test) <- environment() ##
  davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()

giving:

[1] 0.01858149
Warning message:
In summary.lm(object) : essentially perfect fit: summary may be unreliable
like image 74
G. Grothendieck Avatar answered Nov 05 '22 14:11

G. Grothendieck


No need to make it a global variable. The problem is actually in segmented, not davies.test. It's not finding fit.

You can use dynGet to locate fit in any environment, including the calling function's environment:

testit <- function() {
  data = data.frame(x = 1:21, y = c(10:1, 0:10))
  fit = lm(y ~ x, data)
  fit.seg = segmented(dynGet("fit"), seg.Z = ~ x)
  davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()

That should work as you intend.

If you have multiple variables named fit in different environments, then use get (see ?get) to specify which environment you want to get it from. dynGet is the "look everywhere; return first" lazy version.

like image 36
Mekki MacAulay Avatar answered Nov 05 '22 14:11

Mekki MacAulay


I contacted the author of segmented and he promptly responded. Another solution he proposed to the original issue would be

testit <- function() {
  data = data.frame(x = 1:21, y = c(10:1, 0:10))
  fit = lm(y ~ x, data)
  fit.seg = segmented(fit, seg.Z = ~ x)
  fit.seg$call$obj<-fit
  davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}

However, he also pointed out that the lm object should actually be passed directly to davies.test() as follows:

testit <- function() {
  data = data.frame(x = 1:21, y = c(10:1, 0:10))
  fit = lm(y ~ x, data)
  davies.test(fit, seg.Z = ~ x, alternative = "greater")$p.value
}

For clarification though, it should be noted that these two bits of code do different things: the second fragment actually fulfills my original purpose (checking for a statistically significant break in the fit), while the first fragment checks to see whether there is a second break.

like image 28
datawookie Avatar answered Nov 05 '22 13:11

datawookie