I have a question which is connected to this one, which I asked previously: Assignment of a value from a foreach loop . I found out that although the solutions I was provided by friendly users point into the right direction they don't solve my actual problem. Here the sample data set:
td <- data.table(date=c(rep(1,10),rep(2,10)),var=c(rep(1,4),2,rep(1,5)),id=rep(1:10,2))
It is the same as before, but it reflects my real data better What I want to do in words: For each id I want to have the mean for all other ids within a certain period (e.g. mean(td[date=="2004-01-01" & id!=1]$var) but that for all periods and all ids). So it is some kind of nested operation. I tried something like that:
td[,.SD[,mean(.SD$var[-.I]),by=id],by=date]
But that doesn't give the right results.
Josh very intelligently suggested to use `.BY ` instead of `.GRP`
td[, td[!.BY, mean(var), by=date], by=id]
If you key by id
you can use .GRP
in the following way:
setkey(td, id)
## grab all the unique IDs. Only necessary if not all ids are
## represented in all dates
uid <- unique(td$id)
td[, td[!.(uid[.GRP]), mean(var), by=date] , by=id]
id date V1
1: 1 1 1.111111
2: 1 2 1.111111
3: 2 1 1.111111
4: 2 2 1.111111
5: 3 1 1.111111
6: 3 2 1.111111
7: 4 1 1.111111
8: 4 2 1.111111
9: 5 1 1.000000
10: 5 2 1.000000
11: 6 1 1.111111
12: 6 2 1.111111
13: 7 1 1.111111
14: 7 2 1.111111
15: 8 1 1.111111
16: 8 2 1.111111
17: 9 1 1.111111
18: 9 2 1.111111
19: 10 1 1.111111
20: 10 2 1.111111
Does this do it?
DT[,{
vbar <- mean(var)
n <- .N
.SD[,(n*vbar-sum(var))/(n-.N),by=id]
},by='date']
EDIT (Reply to @Arun's comment): The cryptic expression in the middle is the solution to (pseudocode)
mean(everything) = weight(this)*mean(this) + weight(others)*mean(others)
EDIT2 (benchmarking): I prefer Josh/Richardo's answer, but this bit of algebra reduces the number of computations, for when that matters:
require(microbenchmark)
setkey(DT,id)
microbenchmark(
algebra=DT[,{
vbar <- mean(var)
n <- .N
.SD[,(n*vbar-sum(var))/(n-.N),by=id]
},by='date'],
bybyby=DT[, DT[!.BY, mean(var), by=date], by=id]
)
# Unit: milliseconds
# expr min lq median uq max neval
# algebra 6.448764 6.920922 7.083707 7.38093 64.36238 100
# bybyby 37.778504 39.425788 41.628918 44.26533 130.85040 100
The user would probably have their DT keyed already, but if not, that also carries a slight cost, I guess.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With