Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use R data.table column names with cube(..., j = ,...) within a function?

Tags:

r

data.table

I can obtain a summary of a variable stratified by other variables as follows:

require(data.table)

DT <- data.table(mtcars)

var_work <- "hp"
by_vars <- c("cyl", "carb")

ans_1 <- cube(DT, j = as.list(quantile(get(var_work))), by = by_vars)

ans_1
    cyl carb  0%    25%   50%    75% 100%
 1:   6    4 110 110.00 116.5 123.00  123
 2:   4    1  65  66.00  66.0  93.00   97
 3:   6    1 105 106.25 107.5 108.75  110
 4:   8    2 150 150.00 162.5 175.00  175
 5:   8    4 205 218.75 237.5 245.00  264
 6:   4    2  52  69.25  93.0 105.50  113
 7:   8    3 180 180.00 180.0 180.00  180
 8:   6    6 175 175.00 175.0 175.00  175
 9:   8    8 335 335.00 335.0 335.00  335
10:   6   NA 105 110.00 110.0 123.00  175
11:   4   NA  52  65.50  91.0  96.00  113
12:   8   NA 150 176.25 192.5 241.25  335
13:  NA    4 110 123.00 210.0 241.25  264
14:  NA    1  65  66.00  93.0 101.00  110
15:  NA    2  52  92.00 111.0 150.00  175
16:  NA    3 180 180.00 180.0 180.00  180
17:  NA    6 175 175.00 175.0 175.00  175
18:  NA    8 335 335.00 335.0 335.00  335
19:  NA   NA  52  96.50 123.0 180.00  335

Next, I would like to write a helper function implementing exactly what is shown above, which however produces the error:

my_fun <- function(table_work, var_w, by_v) {

    tab_out <- cube(table_work, j = as.list(quantile(get(var_w))), by = by_v)
    return(tab_out)

}

ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)

Error in get(var_w) : object 'var_w' not found

I have searched for the answer some relevant blogs (e.g., Advanced tips and tricks with data.table) and posts (e.g., by Henrik, frankc etc.), and tried different combinations of quote(), eval(), get(), assign() etc. within "my_fun", but nothing worked for me.

The question is: How should I correct "my_fun" helper function so it works and produces the same result as ans_1?

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-18     fst_0.9.0         data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.6.1  parallel_3.6.1  Rcpp_1.0.3      grid_3.6.1      lattice_0.20-38

like image 867
Max Moldovan Avatar asked Jan 07 '20 05:01

Max Moldovan


People also ask

What does the data table () function provide to big data processing?

The data. table package provides a faster implementation of the merge() function. The syntax is pretty much the same as base R's merge() .


1 Answers

When reading through the code for data.table:::cube.data.table and data.table:::groupingsets.data.table, the j argument is already being evaluated using NSE. Hence, being unable to pass in as.name(var_work) to the environment argument of substitute, the function will fail.

As a workaround, you can use .SDcols:

library(data.table)    
DT <- data.table(mtcars)    
var_work <- "hp"
by_vars <- c("cyl", "carb")

my_fun <- function(table_work, var_w, by_v) {
    cube(table_work, j=as.list(quantile(.SD[[1L]])), by=by_v, .SDcols=var_w)
}

ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)
like image 155
chinsoon12 Avatar answered Nov 03 '22 05:11

chinsoon12