Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use non-standard evaluation NSE to evaluate arguments on data.table?

Tags:

r

data.table

nse

Say I have the following

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  code = rlang::enquos(...)
  cars1 = x[[1]]
  rlang::eval_tidy(quo(cars1[!!!code]))
}

car_list[,.N, by = speed]

so I wished to perform arbitrary operations on cars1 and cars2 by defining the [.dd function so that whatever I put into ... get executed by cars1 and cars2 using the [ data.table syntax e.g.

car_list[,.N, by = speed] should perform the following

cars1[,.N, by = speed]
cars2[,.N, by = speed]

also I want

car_list[,speed*2]

to do

cars1[,speed*2]
cars2[,speed*2]

Basically, ... in [.dd has to accept arbitrary code.

somehow I need to capture the ... so I tried to do code = rlang::enquos(...) and then rlang::eval_tidy(quo(cars1[!!!code])) doesn't work and gives error

Error in [.data.table(cars1, ~, ~.N, by = ~speed) : argument "i" is missing, with no default

like image 936
xiaodai Avatar asked Jul 20 '19 08:07

xiaodai


People also ask

What is non standard evaluation?

As the name suggests, non-standard evaluation breaks away from the standard evaluation (SE) rules in order to do something special. There are three common uses of NSE: Labelling enhances plots and tables by using the expressions supplied to a function, rather than their values.

What is tidy evaluation?

Tidy evaluation is a framework for controlling how expressions and variables in your code are evaluated by tidyverse functions. This framework, housed in the rlang package, is a powerful tool for writing more efficient and elegant code.


3 Answers

While not under rlang type of mantra, this approach seems to work pretty well: lapply(dt_list, '[', ...) The code would be more readable to me as it is explicit about what method is being used. If I saw car_list[, .N, by = speed] I would expect the default data.table methods.

Making it as a function allows you to have the best of both worlds:

class(car_list) <- "dd"

`[.dd` <- function(x,...) {
 lapply(x, '[', ...)
}

car_list[, .N, speed]
car_list[, speed * 2]
car_list[, .(.N, max(dist)), speed]
car_list[, `:=` (more_speed = speed+5)]

Here are some examples of the approach:

car_list[, .N, speed]
# lapply(car_list, '[', j = .N, by = speed)
# or
# lapply(car_list, '[', , .N, speed)
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
car_list[, speed * 2]
# lapply(car_list, '[', j = speed*2)
# or
# lapply(car_list, '[', , speed*2)
[[1]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

[[2]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

car_list[, .(.N, max(dist)), speed]
# lapply(car_list, '[', j = list(.N, max(dist)), by = speed)
# or 
# lapply(car_list, '[', ,.(.N, max(dist)), speed)

[[1]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

[[2]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

This works with the := operator:

car_list[, `:=` (more_speed = speed+5)]
# or
# lapply(car_list, '[', , `:=` (more_speed = speed+5))

car_list
[[1]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
...

[[2]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
like image 62
Cole Avatar answered Nov 15 '22 10:11

Cole


First base R option is substitute(...()) followed by do.call:

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))
cars2[, speed := sort(speed, decreasing = TRUE)]

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  a <- substitute(...()) #this is an alist
  expr <- quote(x[[i]])
  expr <- c(expr, a)
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- do.call(data.table:::`[.data.table`, expr)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

Second base R option is match.call, modify the call and then evaluate (you find this approach in lm):

`[.dd` <- function(x,...) {
  thecall <- match.call()
  thecall[[1]] <- quote(`[`)
  thecall[[2]] <- quote(x[[i]])
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- eval(thecall)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

I haven't tested if these approaches will make a deep copy if you use :=.

like image 25
Roland Avatar answered Nov 15 '22 10:11

Roland


The suggestion in my comment wasn't complete. You can indeed use rlang to support tidy evaluation, but since data.table itself doesn't support it directly, you're better off using expressions instead of quosures, and you need to build the complete final expression before calling eval_tidy:

`[.dd` <- function(x, ...) {
  code <- rlang::enexprs(...)
  lapply(x, function(dt) {
    ex <- rlang::expr(dt[!!!code])
    rlang::eval_tidy(ex)
  })
}

car_list[, .N, by = speed]
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1

[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1
like image 23
Alexis Avatar answered Nov 15 '22 08:11

Alexis