Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming with `{data.table}`: how to name a new column?

Tags:

r

data.table

The following question seems very basic in programming with data.table, so my apologies if it's a duplicate. I spent time researching but could not find an answer.

I want to create a "user-defined function" that wraps around a data.table wrangling procedure. In this procedure, a new column is created, and I want to let the user set the name of that new column.

Example

Consider the following code that works as-is. I want to wrap it inside a function.

library(data.table)
library(magrittr)
library(tibble)

mtcars %>%
  as.data.table() %>%
  .[, .(max_mpg = max(mpg)), by = cyl] %>%
  as_tibble()
#> # A tibble: 3 x 2
#>     cyl max_mpg
#>   <dbl>   <dbl>
#> 1     6    21.4
#> 2     4    33.9
#> 3     8    19.2

Created on 2021-10-13 by the reprex package (v0.3.0)

All I want my function to do is let the user set the name of new_colname_of_choice:

my_wrapper <- function(new_colname_of_choice) {
  mtcars %>%
    as.data.table() %>%
    .[, .(new_colname_of_choice = max(mpg)), by = cyl] %>%
    as_tibble()
}


my_wrapper(new_colname_of_choice = "my_lovely_colname")
#> # A tibble: 3 x 2
#>     cyl new_colname_of_choice <---------- why this isn't called "my_lovely_colname"?
#>   <dbl>                 <dbl>
#> 1     6                  21.4
#> 2     4                  33.9
#> 3     8                  19.2

I've tried using curly braces which didn't work either (actually threw an error):

my_wrapper_2 <- function(new_colname_of_choice) {
  
  mtcars %>%
    as.data.table() %>%
    .[, .({new_colname_of_choice} = max(mpg)), by = cyl] %>%
    as_tibble()
  
}

Error: unexpected '=' in: " as.data.table() %>% .[, .({new_colname_of_choice} ="

Which is surprising because curly braces do promote the desired naming ability, but in a different (yet similar) kind of code:

my_wrapper_3 <- function(new_colname_of_choice) {
  mtcars %>%
    as.data.table() %>%
    .[, {new_colname_of_choice} := max(mpg), by = cyl] %>%
    as_tibble()
}


my_wrapper_3(new_colname_of_choice = "my_lovely_colname")

## # A tibble: 32 x 12
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb my_lovely_colname <---- SUCCESS!
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>             <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4              21.4
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4              21.4
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1              33.9
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1              21.4
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2              19.2
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1              21.4
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4              19.2
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2              33.9
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2              33.9
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4              21.4
## # ... with 22 more rows

Bottom line

My conclusion is that the = operator is sensitive to {...} on the LHS. How can I otherwise pass a name (from argument) to the LHS in the initial my_wrapper() example?


EDIT


I'd like to add the dplyr solution for the same problem, taken from the programming with dplyr vignette:

library(dplyr)

my_wrapper_dplyr <- function(new_colname_of_choice) {
  mtcars %>%
    group_by(cyl) %>%
    summarise("{new_colname_of_choice}" := max(mpg))
}

my_wrapper_dplyr("another_lovely_colname")

Which is pretty robust and works in all naming situations I've encountered. Is there a built-in/canonical practice in data.table similar to {dplyr}'s?

like image 371
Emman Avatar asked Oct 13 '21 10:10

Emman


1 Answers

With the upcoming data.table version 1.14.3, you'll be able to use the new env parameter:

A new interface for programming on data.table has been added, closing #2655 and many other linked issues. It is built using base R's substitute-like interface via a new env argument to [.data.table. For details see the new vignette programming on data.table, and the new ?substitute2 manual page. Thanks to numerous users for filing requests, and Jan Gorecki for implementing.

# install dev version
install.packages("https://github.com/Rdatatable/data.table/archive/master.tar.gz",  repo = NULL, type = "source")

library(tibble)
library(data.table)

my_wrapper_new <- function(new_colname_of_choice) {
  
  mtcars %>%
    as.data.table() %>%
    .[, .(new_colname_of_choice = max(mpg)), by = cyl, 
      env=list(new_colname_of_choice = new_colname_of_choice)] %>%
    as_tibble()
  
}

my_wrapper_new('test')

# A tibble: 3 x 2
    cyl  test
  <dbl> <dbl>
1     6  21.4
2     4  33.9
3     8  19.2

like image 97
Waldi Avatar answered Oct 21 '22 03:10

Waldi