Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating a weighted mean using data.table in R with weights in one of the table columns

Tags:

r

data.table

I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below

    set.seed(12345)
    dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
    dt$key = sample(toupper(letters[1:3]),5,replace=T)
    setkey(dt, key)

First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)

dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]

Second trying to define a function to apply to the .SD as I would if I were using ddply.

This fails too.

wmn=function(x){
  tmp = NULL
  for(i in 2:ncol(x)){
    tmp1 = weighted.mean(x[,i],x[,1])
    tmp = c(tmp,tmp1)
  }
  return(tmp)
}

dt[,wmn,by=key]

Any thoughts on how best to do this?

Thanks

EDIT

Change to error on wmn formula on columns selected.

SECOND EDIT

Weighted Mean formula reversed back and added set.seed

like image 953
Tahnoon Pasha Avatar asked May 20 '13 03:05

Tahnoon Pasha


People also ask

How do you find weighted mean with weights?

The Weighted mean is calculated by multiplying the weight with the quantitative outcome associated with it and then adding all the products together. If all the weights are equal, then the weighted mean and arithmetic mean will be the same.

How do you calculate weighted mean in R?

Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1.

What is the formula for weighted mean for grouped data?

Formula of weighted mean = Sum of product series and weights / Total weights. = w * x / w.


1 Answers

If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick:

dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]
like image 199
Frank Avatar answered Sep 23 '22 06:09

Frank