Is it possible to return multiple rows per group in a grouped command in data.table? In dplyr, this is done with reframe:
y <- c("a", "b", "d", "f")
df <- tibble(
g = c(1, 1, 1, 2, 2, 2, 2),
x = c("e", "a", "b", "e", "f", "c", "a")
)
library(dplyr)
df %>%
reframe(x = setdiff(x, y), .by = g)
# g x
# 1 e
# 2 e
# 2 c
In data.table, this returns an error:
library(data.table)
dt <- setDT(df)
dt[, x := setdiff(x, y), g]
Error in
[.data.table(df, ,:=(x, intersect(x, y)), g) :
Supplied 2 items to be assigned to group 1 of size 3 in column 'x'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
Anyway to get a data.table equivalent of reframe?
Wrap in .(...) and use = in place of := (because it's within .(..)).
as.data.table(df)[, .(x = setdiff(x, y)), by = g]
# g x
# <num> <char>
# 1: 1 e
# 2: 2 e
# 3: 2 c
Note that under the hood, .(.) is really just list(.), so we could also use anything that returns list-like objects, including:
as.data.table(df)[, list(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.table(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.frame(x = setdiff(x, y)), by = g]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With