I am working with the segmented package and have encountered a problem when calling davies.test()
from within a function.
Consider the following situation:
library(segmented)
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data = data)
fit.seg = segmented(fit, seg.Z = ~ x)
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")
That works perfectly and indicates that the segmented regression has two statistically different slopes.
Now if I package all of that up into a function like this:
testit <- function() {
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data)
fit.seg = segmented(fit, seg.Z = ~ x)
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()
Then it works fine...
But if I delete fit
from the global environment then it fails.
> rm(fit)
> testit()
Error in eval(expr, envir, enclos) : object 'fit' not found
The problem seems to be with the way that davies.test
is trying to access the data encapsulated in fit
: it doesn't seem to look for fit
in the enclosing scope (which in this case is the testit
function), but skips directly to the global scope.
I'm sure that the problem relates to some subtlety with R's scoping rules. If I can find a quick fix that would prevent me from troubling the package author with this edge case, that would be great.
Thanks, Andrew.
The enclosing environment is the environment where the function was created. Every function has one and only one enclosing environment. For the three other types of environment, there may be 0, 1, or many environments associated with each function: Binding a function to a name with <- defines a binding environment.
Global environment can be referred to as . GlobalEnv in R codes as well. We can use the ls() function to show what variables and functions are defined in the current environment. Moreover, we can use the environment() function to get the current environment.
The parent environment of a function is the environment in which the function was created. If a function was created in the execution environment (for example, in the global environment), then the environment in which the function was called will be the same as the environment in which the function was created.
Overview. The ls() function in R is used to return a vector of character strings containing all the variables and functions that are defined in the current working directory in R programming. Variables whose names begin with a dot are, by default, not returned.
Try inserting the line marked ##
below. There is still a difference that this does not account for as shown by the warning that appears when the modified testit
is run but the output pvalue is the same so it may be sufficient for your needs. This is, of course, a bug in the package and best would really be to ask the maintainer of the package if they would fix it.
library(segmented)
testit <- function() {
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data)
fit.seg = segmented(fit, seg.Z = ~ x)
environment(davies.test) <- environment() ##
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()
giving:
[1] 0.01858149
Warning message:
In summary.lm(object) : essentially perfect fit: summary may be unreliable
No need to make it a global variable. The problem is actually in segmented
, not davies.test
. It's not finding fit
.
You can use dynGet
to locate fit
in any environment, including the calling function's environment:
testit <- function() {
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data)
fit.seg = segmented(dynGet("fit"), seg.Z = ~ x)
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
testit()
That should work as you intend.
If you have multiple variables named fit
in different environments, then use get
(see ?get
) to specify which environment you want to get it from. dynGet
is the "look everywhere; return first" lazy version.
I contacted the author of segmented
and he promptly responded. Another solution he proposed to the original issue would be
testit <- function() {
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data)
fit.seg = segmented(fit, seg.Z = ~ x)
fit.seg$call$obj<-fit
davies.test(fit.seg, seg.Z = ~ x, alternative = "greater")$p.value
}
However, he also pointed out that the lm
object should actually be passed directly to davies.test()
as follows:
testit <- function() {
data = data.frame(x = 1:21, y = c(10:1, 0:10))
fit = lm(y ~ x, data)
davies.test(fit, seg.Z = ~ x, alternative = "greater")$p.value
}
For clarification though, it should be noted that these two bits of code do different things: the second fragment actually fulfills my original purpose (checking for a statistically significant break in the fit), while the first fragment checks to see whether there is a second break.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With