I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below <pre class="prettyprint"><code> set.seed(12345) dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5)) dt$key = sample(toupper(letters[1:3]),5,replace=T) setkey(dt, key) </code></pre> First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to) <pre class="prettyprint"><code>dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key] </code></pre> Second trying to define a function to apply to the .SD as I would if I were using ddply. This fails too. <pre class="prettyprint"><code>wmn=function(x){ tmp = NULL for(i in 2:ncol(x)){ tmp1 = weighted.mean(x[,i],x[,1]) tmp = c(tmp,tmp1) } return(tmp) } dt[,wmn,by=key] </code></pre> Any thoughts on how best to do this? Thanks EDIT Change to error on wmn formula on columns selected. SECOND EDIT Weighted Mean formula reversed back and added set.seed

If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick: <pre class="prettyprint"><code>dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]] </code></pre>

Calculating a weighted mean using data.table in R with weights in one of the table columns

Tags:

r

data.table

I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below

    set.seed(12345)
    dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
    dt$key = sample(toupper(letters[1:3]),5,replace=T)
    setkey(dt, key)

First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)

dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]

Second trying to define a function to apply to the .SD as I would if I were using ddply.

This fails too.

wmn=function(x){
  tmp = NULL
  for(i in 2:ncol(x)){
    tmp1 = weighted.mean(x[,i],x[,1])
    tmp = c(tmp,tmp1)
  }
  return(tmp)
}

dt[,wmn,by=key]

Any thoughts on how best to do this?

Thanks

EDIT

Change to error on wmn formula on columns selected.

SECOND EDIT

Weighted Mean formula reversed back and added set.seed

953

asked May 20 '13 03:05

Tahnoon Pasha

1 Answers

If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick:

dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]

199

answered Sep 23 '22 06:09

Frank

Related questions
                            
                                Principal Component Analysis in R data color
                            
                                How to force older packages to install on newer versions of R?
                            
                                Regression line for the entire dataset together with regression lines based on groups in R ggplot2 ?
                            
                                How to make 'head' be applied automatically to output?
                            
                                Displaying only the p-value of multiple t.tests
                            
                                Convert table into a vector to use hist() on r
                            
                                Configure fix() and edit() to open in Notepad++ from R/RStudio
                            
                                Is it possible to use Rstudio to translate from .Rmd to LaTeX directly without pandoc?
                            
                                Avoiding Global Variables
                            
                                principal component analysis (PCA) in R: which function to use?
                            
                                Suppress separator in paste when values are missing
                            
                                Natural sort order (human sort order) in R list.files()
                            
                                Date shows up as number
                            
                                European/french thousand separator in ggplot
                            
                                Removing multiple spaces and trailing spaces using gsub
                            
                                Getting numerator and denominator of a fraction in R
                            
                                Scaling data in R gives spurious Error "length of 'center' must equal the number of columns of 'x'"
                            
                                B Spline confusion
                            
                                Computing the circularity of a binary image
                            
                                Selective suppressWarnings() that filters by regular expression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With