Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine aggregation, filter, and by in data.table?

Tags:

r

data.table

I need to aggregate all my columns, but only if their values are 1, and group by an another columns.

Here is how I could perform this, step by step, i.e. column by column, from foo to out, the expected output.

set.seed(123)
foo <- data.table(id = rep(LETTERS[1:10], each = 10),
                  code = sample(c(111, 222, 333, 444), 100, replace = TRUE),
                  x1 = sample(0:1, 100, replace = TRUE),
                  x2 = sample(0:1, 100, replace = TRUE),
                  x3 = sample(0:1, 100, replace = TRUE),
                  x4 = sample(0:1, 100, replace = TRUE),
                  x5 = sample(0:1, 100, replace = TRUE),
                  x6 = sample(0:1, 100, replace = TRUE))



f1 <- foo[x1 == 1, .(x1 = uniqueN(code)), by = id]
f2 <- foo[x2 == 1, .(x2 = uniqueN(code)), by = id]
f3 <- foo[x3 == 1, .(x3 = uniqueN(code)), by = id]

setkey(f1, id)
setkey(f2, id)
setkey(f3, id)

out <- f1[f2,][f3,]

I'm sure there is an elegant and fast way to do this in data.table, maybe with a function and the .SDcols parameter, right?

But I can't find it...

Many thanks !!

like image 623
Discus23 Avatar asked Nov 26 '25 07:11

Discus23


2 Answers

With .SD and .SDcols:

foo[,lapply(.SD, function(x) uniqueN(code[x==1])), .SDcols = x1:x3, by = id]

    id x1 x2 x3
 1:  A  3  3  2
 2:  B  4  4  4
 3:  C  2  2  3
 4:  D  4  3  3
 5:  E  4  4  3
 6:  F  1  3  3
 7:  G  3  2  2
 8:  H  4  2  3
 9:  I  3  3  3
10:  J  3  2  2
like image 116
Waldi Avatar answered Nov 28 '25 20:11

Waldi


Using tidyverse

library(dplyr)
foo %>%
   group_by(id) %>%
   summarise(across(x1:x3, ~ n_distinct(code[.==1])))

-output

# A tibble: 10 x 4
#   id       x1    x2    x3
# * <chr> <int> <int> <int>
# 1 A         3     3     2
# 2 B         4     4     4
# 3 C         2     2     3
# 4 D         4     3     3
# 5 E         4     4     3
# 6 F         1     3     3
# 7 G         3     2     2
# 8 H         4     2     3
# 9 I         3     3     3
#10 J         3     2     2

Or can use collapse

library(collapse)
collap(slt(foo, id, x1:x3), ~ id, fNdistinct)
like image 21
akrun Avatar answered Nov 28 '25 20:11

akrun