Imagine I have a data.table DT that has columns a, b, c. I want to filter rows based on a (say, select only those with value "A"), compute the sum of b by c. I can do this efficiently, using binary search for filtering, by
setkey(DT, a)
DT[.("A"), .(sum.B = sum(B)), by = .(C)]
What if then I want to filter rows based on the value of the newly obtained sum.b? If I want to keep rows where sum.b equals one of c(3, 4, 5), I can do that by saying
DT[.("A"), .(sum.B = sum(B)), by = .(C)][sum.b %in% c(3, 4, 5)]
but the latter operation uses vector scan which is slow. Is there a way to set keys "on the fly" while chaining? Ideally I would have
DT[.("A"), .(sum.B = sum(B)), by = .(C)][??set sum.b as key??][.(c(3, 4, 5))]
where I don't know the middle step.
The middle step you are asking in the question would be the following:
# unnamed args
DT[,.SD,,sum.b]
# named args
DT[j = .SD, keyby = sum.b]
# semi named
DT[, .SD, keyby = sum.b]
Yet you should benchmark it on your data as it may be slower than vector scan as you need to setkey.
It looks like eddi already provide that solution in comment. The FR mentioned by him is data.table#1105.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With