I am trying to use data.table's ..
notation with functions, here is the code I have so far:
set.seed(42)
dt <- data.table(
x = rnorm(10),
y = runif(10)
)
test_func <- function(data, var, var2) {
vars <- c(var, var2)
data[, ..vars]
}
test_func(dt, 'x', 'y') # this works
test_func2 <- function(data, var, var2) {
data[, ..var]
}
test_func2(dt, 'x', 'y') # this works too
test_func3 <- function(data, var, var2) {
data[, sum(..var)]
}
test_func3(dt, 'x', 'y')
# this does not work
# Error in eval(jsub, SDenv, parent.frame()) : object '..var' not found
It seems data.table
does not recognize ..
once it's wrapped inside another function in j
. I know I can use sum(get(var))
to achieve the results but I want to know I am using the best practice in most situation.
Parroting an answer to a different problem that works here as well. Not the prettiest solution, but variants on this have worked for me numerous times in the past.
Thanks @Frank for a non-parse()
solution here!
I'm well familiar with the old adage "If the answer is parse() you should usually rethink the question.", but I have a hard time coming up with alternatives many times when evaluating within the data.table
calling environment, I'd love to see a robust solution that doesn't execute arbitrary code passed in as a character string. In fact, half the reason I'm posting an answer like this is in hopes that someone can recommend a better option.
test_func3 <- function(data, var, var2) {
expr = substitute(sum(var), list(var=as.symbol(var)))
data[, eval(expr)]
}
test_func3(dt, 'x', 'y')
## [1] 5.472968
Quick disclaimer on hypothetical doomsday scenarios possible with eval(parse(...))
There are far more in depth discussions on the dangers of eval(parse(...))
, but I'll avoid repeating them in full.
Theoretically you could have issues if one of your columns is named something unfortunate like "(system(paste0('kill ',Sys.getpid())))"
(Do not execute that, it will kill your R session on the spot!). This is probably enough of an outside chance to not lose sleep over it unless you plan on putting this in a package on CRAN.
For the specific case in the comments below where the table is grouped and then sum
is applied to all, .SDcols
is potentially useful. The only way I'm aware of to make sure that this function would return consistent results even if dt
had a column named var3
is to evaluate the arguments within the function environment but outside of the data.table
environment using c()
.
set.seed(42)
dt <- data.table(
x = rnorm(10),
y = rnorm(10),
z = sample(c("a","b","c"),size = 10, replace = TRUE)
)
test_func3 <- function(data, var, var2, var3) {
ListOfColumns = c(var,var2)
GroupColumn <- c(var3)
dt[, lapply(.SD, sum), by= eval(GroupColumn), .SDcols = ListOfColumns]
}
test_func3(dt, 'x', 'y','z')
returns
z x y
1: b 1.0531555 2.121852
2: a 0.3631284 -1.388861
3: c 4.0566838 -2.367558
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With