I noticed the following peculiar behavior when I upgraded from R 3.4.3 to R 3.6.0 (both were using data.table 1.12.6). In 3.4.3 the code below leads to the all.equal statement being TRUE, but in 3.6.0 there is a mean relative difference that comes from the fact that even though we are trying to access the approxfun calculated from group "a", the values from group "b" are used (probably somehow due to lazy evaluation). In 3.6.0, this issue can be solved by adding a copy statement in the calls to approxfun based on this question: Handling of closures in data.table
The fascinating thing to me is that I do not get an error in 3.4.3. Any idea what changed?
library(data.table)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
dtFuncs <- data[ , list(
func = list(stats::approxfun(x, y, rule = 2))
), by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
After running git bisect on the r-source, I was able to deduce that it was this commit that caused the behavior: https://github.com/wch/r-source/commit/adcf18b773149fa20f289f2c8f2e45e6f7b0dbfe
What fundamentally happened was that in the case where x's were ordered in approxfun, an internal copy was no longer made. If the data had been randomly sorted, the code would have continued to work! (see snippet below)
Lesson for me is that its probably best not to mix complicated objects with data.table as the same environment is used over and over for each "by" group (or being very deliberate with data.table::copy)
## should be run under R > 3.6.0 to see disparity
library(data.table)
## original sorted x (does not work)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
dtFuncs <- data[ , {
print(environment())
list(
func = list(stats::approxfun(x, y, rule = 2))
)
}, by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
## unsorted x (works)
data <- data.table(
group = c(rep("a", 4), rep("b", 4)),
x = rep(c(.02, .04, .12, .21), 2),
y = c(
0.0122, 0.01231, 0.01325, 0.01374, 0.01218, 0.01229, 0.0133, 0.01379)
)
set.seed(10)
data <- data[sample(1:.N, .N)]
dtFuncs <- data[ , {
print(environment())
list(
func = list(stats::approxfun(x, y, rule = 2))
)
}, by = group]
f <- function(group, x) {
dtResults <- CJ(group = group, x = x)
dtResults <- dtResults[ , {
.g <- group
f2 <- dtFuncs[group == .g, func][[1]]
list(x = x, y = f2(x))
}, by = group]
dtResults
}
get("y", environment(dtFuncs$func[[1]]))
get("y", environment(dtFuncs$func[[2]]))
x0 <- .07
g <- "a"
all.equal(
with(data[group == g], approx(x, y, x0, rule = 2)$y),
f(group = g, x = x0)$y
)
## better approach: maybe safer to avoid mixing objects treated by reference
## (data.table & closures) all together...
fList <- lapply(split(data, by = "group"), function(x){
with(x, stats::approxfun(x, y, rule = 2))
})
fList
fList[[1]](.07) != fList[[2]](.07)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With