Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table .. notation with functions in j

Tags:

r

data.table

I am trying to use data.table's .. notation with functions, here is the code I have so far:

set.seed(42)
dt <- data.table(
  x = rnorm(10),
  y = runif(10)
)

test_func <- function(data, var, var2) {
  vars <- c(var, var2)
  data[, ..vars]
}

test_func(dt, 'x', 'y') # this works

test_func2 <- function(data, var, var2) {
  data[, ..var]
}

test_func2(dt, 'x', 'y') # this works too

test_func3 <- function(data, var, var2) {
  data[, sum(..var)]
}

test_func3(dt, 'x', 'y') 
# this does not work
# Error in eval(jsub, SDenv, parent.frame()) : object '..var' not found

It seems data.table does not recognize .. once it's wrapped inside another function in j. I know I can use sum(get(var)) to achieve the results but I want to know I am using the best practice in most situation.

like image 760
EKtheSage Avatar asked Feb 14 '18 22:02

EKtheSage


1 Answers

Parroting an answer to a different problem that works here as well. Not the prettiest solution, but variants on this have worked for me numerous times in the past.

Thanks @Frank for a non-parse() solution here!

I'm well familiar with the old adage "If the answer is parse() you should usually rethink the question.", but I have a hard time coming up with alternatives many times when evaluating within the data.table calling environment, I'd love to see a robust solution that doesn't execute arbitrary code passed in as a character string. In fact, half the reason I'm posting an answer like this is in hopes that someone can recommend a better option.

test_func3 <- function(data, var, var2) {
  expr = substitute(sum(var), list(var=as.symbol(var)))
  data[, eval(expr)]
}

test_func3(dt, 'x', 'y')
## [1] 5.472968

Quick disclaimer on hypothetical doomsday scenarios possible with eval(parse(...))

There are far more in depth discussions on the dangers of eval(parse(...)), but I'll avoid repeating them in full.

Theoretically you could have issues if one of your columns is named something unfortunate like "(system(paste0('kill ',Sys.getpid())))" (Do not execute that, it will kill your R session on the spot!). This is probably enough of an outside chance to not lose sleep over it unless you plan on putting this in a package on CRAN.


Update:

For the specific case in the comments below where the table is grouped and then sum is applied to all, .SDcols is potentially useful. The only way I'm aware of to make sure that this function would return consistent results even if dt had a column named var3 is to evaluate the arguments within the function environment but outside of the data.table environment using c().

set.seed(42)
dt <- data.table(
  x = rnorm(10),
  y = rnorm(10),
  z = sample(c("a","b","c"),size = 10, replace = TRUE)
)


test_func3 <- function(data, var, var2, var3) {
  ListOfColumns = c(var,var2)
  GroupColumn <- c(var3)
  dt[, lapply(.SD, sum), by= eval(GroupColumn), .SDcols = ListOfColumns]
}

test_func3(dt, 'x', 'y','z') 

returns

   z         x         y
1: b 1.0531555  2.121852
2: a 0.3631284 -1.388861
3: c 4.0566838 -2.367558
like image 132
Matt Summersgill Avatar answered Oct 14 '22 06:10

Matt Summersgill