Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted sum of variables by groups with data.table

I am looking for a solution to compute weighted sum of some variables by groups with data.table. I hope the example is clear enough.

require(data.table)

dt <- data.table(matrix(1:200, nrow = 10))
dt[, gr := c(rep(1,5), rep(2,5))]
dt[, w := 2]

# Error: object 'w' not found
dt[, lapply(.SD, function(x) sum(x * w)),
   .SDcols = paste0("V", 1:4)]

# Error: object 'w' not found
dt[, lapply(.SD * w, sum),
   .SDcols = paste0("V", 1:4)]

# This works with out groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
   .SDcols = paste0("V", 1:4)]

# It does not work by groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
   .SDcols = paste0("V", 1:4), keyby = gr]

# The result to be expected
dt[, list(V1 = sum(V1 * w),
          V2 = sum(V2 * w),
          V3 = sum(V3 * w),
          V4 = sum(V4 * w)), keyby = gr]

### from Aruns answer
dt[, lapply(.SD[, paste0("V", 1:4), with = F],
            function(x) sum(x*w)), by=gr]
like image 305
djhurio Avatar asked Jul 19 '13 10:07

djhurio


1 Answers

Final attempt (copying Roland's answer :))

Copying @Roland's excellent answer:

print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])

still not the most efficient one: (second attempt)

Following @Roland's comment, it's indeed faster to do the operation on all columns and then just remove the unwanted ones (as long as the operation itself is not time consuming, which is the case here).

dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]

For some reason, w seems to be not found when I don't use {}.. No idea why though.


old (inefficient) answer:

(Subsetting can be costly if there are too many groups)

You can do this without using .SDcols and then removing it while providing it to lapply as follows:

dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
#    gr V1  V2  V3  V4
# 1:  1 20 120 220 320
# 2:  2 70 170 270 370

.SDcols makes .SD without the w column. So, it's not possible to multiply with w as it doesn't exist within the scope of .SD environment then.

like image 99
Arun Avatar answered Sep 19 '22 12:09

Arun