I am looking for a solution to compute weighted sum of some variables by groups with data.table. I hope the example is clear enough.
require(data.table)
dt <- data.table(matrix(1:200, nrow = 10))
dt[, gr := c(rep(1,5), rep(2,5))]
dt[, w := 2]
# Error: object 'w' not found
dt[, lapply(.SD, function(x) sum(x * w)),
.SDcols = paste0("V", 1:4)]
# Error: object 'w' not found
dt[, lapply(.SD * w, sum),
.SDcols = paste0("V", 1:4)]
# This works with out groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4)]
# It does not work by groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4), keyby = gr]
# The result to be expected
dt[, list(V1 = sum(V1 * w),
V2 = sum(V2 * w),
V3 = sum(V3 * w),
V4 = sum(V4 * w)), keyby = gr]
### from Aruns answer
dt[, lapply(.SD[, paste0("V", 1:4), with = F],
function(x) sum(x*w)), by=gr]
Copying @Roland's excellent answer:
print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])
Following @Roland's comment, it's indeed faster to do the operation on all columns and then just remove the unwanted ones (as long as the operation itself is not time consuming, which is the case here).
dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]
For some reason, w
seems to be not found when I don't use {}
.. No idea why though.
(Subsetting can be costly if there are too many groups)
You can do this without using .SDcols
and then removing it while providing it to lapply
as follows:
dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
# gr V1 V2 V3 V4
# 1: 1 20 120 220 320
# 2: 2 70 170 270 370
.SDcols
makes .SD
without the w
column. So, it's not possible to multiply with w
as it doesn't exist within the scope of .SD environment then.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With