I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below
set.seed(12345)
dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
dt$key = sample(toupper(letters[1:3]),5,replace=T)
setkey(dt, key)
First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)
dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]
Second trying to define a function to apply to the .SD as I would if I were using ddply.
This fails too.
wmn=function(x){
tmp = NULL
for(i in 2:ncol(x)){
tmp1 = weighted.mean(x[,i],x[,1])
tmp = c(tmp,tmp1)
}
return(tmp)
}
dt[,wmn,by=key]
Any thoughts on how best to do this?
Thanks
EDIT
Change to error on wmn formula on columns selected.
SECOND EDIT
Weighted Mean formula reversed back and added set.seed
The Weighted mean is calculated by multiplying the weight with the quantitative outcome associated with it and then adding all the products together. If all the weights are equal, then the weighted mean and arithmetic mean will be the same.
Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1.
Formula of weighted mean = Sum of product series and weights / Total weights. = w * x / w.
If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick:
dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With