I am trying to use data.table
within a user facing function in a package I'm working on. I would like this function to behave as data.table
-like as possible. This means for example that my function also features a by
argument, which is passed to the underlying data.table
call within the function. The user should be free to pass anything into "my" by
which is possible directly in a data.table
.
Citing from ?data.table
this includes:
- A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x]
- a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]
- a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
- a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]
- or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]
Here is a minimal (partially) working example to make my intent clear:
library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.2
sample_dt <- data.table(a = 1:5, b = 5:1)
count_by <- function(dt, by = NULL) {
by <- substitute(by)
dt[, .N, by = eval(by, dt, parent.frame())]
}
count_by(sample_dt)
#> N
#> 1: 5
count_by(sample_dt, by = a) # refers to 1 from the list above
#> by N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = list(a)) # refers to 2 from the list above
#> a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = "a") # refers to 3 from the list above
#> a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = c("a")) # refers to 4 from the list above
#> Error in `[.data.table`(dt, , .N, by = eval(by, dt, parent.frame())): 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=evalc("a") should work. This is for efficiency so data.table can detect which columns are needed.
count_by(sample_dt, by = a:b) # refers to 5 from the list above
#> a b N
#> 1: 1 5 1
#> 2: 2 4 1
#> 3: 3 3 1
#> 4: 4 2 1
#> 5: 5 1 1
Created on 2020-02-18 by the reprex package (v0.3.0)
Apart from case 4, everything works as expected using simple substitution and evaluation in the proper context. So my question is:
How can I create functions, which use data.table
internally and mimic the original by
user interface exactly?
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2020-02-18
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.2)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.2)
#> data.table * 1.12.8 2019-12-09 [1] CRAN (R 3.6.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.2)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.2)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.2)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.2)
#> knitr 1.27 2020-01-16 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.2)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.2)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.2)
#> stringi 1.4.4 2020-01-09 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.2)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.2)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.2)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
#>
#> [1] C:/Program Files/R/library
Is there a particular reason for using eval
inside the data.table? I think this would be better:
count_by <- function(dt, by = NULL) {
eval(substitute(dt[, .N, by = by]))
}
It passes all test cases (of course). Even the first one, where your function fails with column name by
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With