Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting key while chaining in R data.table

Tags:

r

data.table

Imagine I have a data.table DT that has columns a, b, c. I want to filter rows based on a (say, select only those with value "A"), compute the sum of b by c. I can do this efficiently, using binary search for filtering, by

setkey(DT, a)
DT[.("A"), .(sum.B = sum(B)), by = .(C)]

What if then I want to filter rows based on the value of the newly obtained sum.b? If I want to keep rows where sum.b equals one of c(3, 4, 5), I can do that by saying

DT[.("A"), .(sum.B = sum(B)), by = .(C)][sum.b %in% c(3, 4, 5)]

but the latter operation uses vector scan which is slow. Is there a way to set keys "on the fly" while chaining? Ideally I would have

DT[.("A"), .(sum.B = sum(B)), by = .(C)][??set sum.b as key??][.(c(3, 4, 5))]

where I don't know the middle step.

like image 552
paljenczy Avatar asked May 08 '26 14:05

paljenczy


1 Answers

The middle step you are asking in the question would be the following:

# unnamed args
DT[,.SD,,sum.b]
# named args
DT[j = .SD, keyby = sum.b]
# semi named
DT[, .SD, keyby = sum.b]

Yet you should benchmark it on your data as it may be slower than vector scan as you need to setkey.

It looks like eddi already provide that solution in comment. The FR mentioned by him is data.table#1105.

like image 89
jangorecki Avatar answered May 10 '26 03:05

jangorecki