I am trying to use data.table within a user facing function in a package I'm working on. I would like this function to behave as data.table-like as possible. This means for example that my function also features a by argument, which is passed to the underlying data.table call within the function. The user should be free to pass anything into "my" by which is possible directly in a data.table.
Citing from ?data.table this includes:
- A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x]
- a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]
- a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
- a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]
- or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]
Here is a minimal (partially) working example to make my intent clear:
library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.2
sample_dt <- data.table(a = 1:5, b = 5:1)
count_by <- function(dt, by = NULL) {
by <- substitute(by)
dt[, .N, by = eval(by, dt, parent.frame())]
}
count_by(sample_dt)
#> N
#> 1: 5
count_by(sample_dt, by = a) # refers to 1 from the list above
#> by N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = list(a)) # refers to 2 from the list above
#> a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = "a") # refers to 3 from the list above
#> a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = c("a")) # refers to 4 from the list above
#> Error in `[.data.table`(dt, , .N, by = eval(by, dt, parent.frame())): 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=evalc("a") should work. This is for efficiency so data.table can detect which columns are needed.
count_by(sample_dt, by = a:b) # refers to 5 from the list above
#> a b N
#> 1: 1 5 1
#> 2: 2 4 1
#> 3: 3 3 1
#> 4: 4 2 1
#> 5: 5 1 1
Created on 2020-02-18 by the reprex package (v0.3.0)
Apart from case 4, everything works as expected using simple substitution and evaluation in the proper context. So my question is:
How can I create functions, which use data.table internally and mimic the original by user interface exactly?
Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2020-02-18
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.2)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2)
#> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.2)
#> data.table * 1.12.8 2019-12-09 [1] CRAN (R 3.6.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.2)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.2)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.2)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.2)
#> knitr 1.27 2020-01-16 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.2)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.2)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.2)
#> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.2)
#> stringi 1.4.4 2020-01-09 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.2)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.2)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.2)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.2)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
#>
#> [1] C:/Program Files/R/library
Is there a particular reason for using eval inside the data.table? I think this would be better:
count_by <- function(dt, by = NULL) {
eval(substitute(dt[, .N, by = by]))
}
It passes all test cases (of course). Even the first one, where your function fails with column name by.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With