I can obtain a summary of a variable stratified by other variables as follows:
require(data.table)
DT <- data.table(mtcars)
var_work <- "hp"
by_vars <- c("cyl", "carb")
ans_1 <- cube(DT, j = as.list(quantile(get(var_work))), by = by_vars)
ans_1
cyl carb 0% 25% 50% 75% 100%
1: 6 4 110 110.00 116.5 123.00 123
2: 4 1 65 66.00 66.0 93.00 97
3: 6 1 105 106.25 107.5 108.75 110
4: 8 2 150 150.00 162.5 175.00 175
5: 8 4 205 218.75 237.5 245.00 264
6: 4 2 52 69.25 93.0 105.50 113
7: 8 3 180 180.00 180.0 180.00 180
8: 6 6 175 175.00 175.0 175.00 175
9: 8 8 335 335.00 335.0 335.00 335
10: 6 NA 105 110.00 110.0 123.00 175
11: 4 NA 52 65.50 91.0 96.00 113
12: 8 NA 150 176.25 192.5 241.25 335
13: NA 4 110 123.00 210.0 241.25 264
14: NA 1 65 66.00 93.0 101.00 110
15: NA 2 52 92.00 111.0 150.00 175
16: NA 3 180 180.00 180.0 180.00 180
17: NA 6 175 175.00 175.0 175.00 175
18: NA 8 335 335.00 335.0 335.00 335
19: NA NA 52 96.50 123.0 180.00 335
Next, I would like to write a helper function implementing exactly what is shown above, which however produces the error:
my_fun <- function(table_work, var_w, by_v) {
tab_out <- cube(table_work, j = as.list(quantile(get(var_w))), by = by_v)
return(tab_out)
}
ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)
Error in get(var_w) : object 'var_w' not found
I have searched for the answer some relevant blogs (e.g., Advanced tips and tricks with data.table) and posts (e.g., by Henrik, frankc etc.), and tried different combinations of quote(), eval(), get(), assign() etc. within "my_fun", but nothing worked for me.
The question is: How should I correct "my_fun" helper function so it works and produces the same result as ans_1?
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-18 fst_0.9.0 data.table_1.12.8
loaded via a namespace (and not attached):
[1] compiler_3.6.1 parallel_3.6.1 Rcpp_1.0.3 grid_3.6.1 lattice_0.20-38
The data. table package provides a faster implementation of the merge() function. The syntax is pretty much the same as base R's merge() .
When reading through the code for data.table:::cube.data.table
and data.table:::groupingsets.data.table
, the j
argument is already being evaluated using NSE. Hence, being unable to pass in as.name(var_work)
to the environment argument of substitute
, the function will fail.
As a workaround, you can use .SDcols
:
library(data.table)
DT <- data.table(mtcars)
var_work <- "hp"
by_vars <- c("cyl", "carb")
my_fun <- function(table_work, var_w, by_v) {
cube(table_work, j=as.list(quantile(.SD[[1L]])), by=by_v, .SDcols=var_w)
}
ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With