weighted mean in dplyr for multiple columns

Question

I'm trying to calculate the weighted mean for multiple columns using dplyr. at the moment I'm stuck with summarize_each which to me seems to be part of the solution. here's some example code:

library(dplyr)
f2a <- c(1,0,0,1)
f2b <- c(0,0,0,1)
f2c <- c(1,1,1,1)
clustervar <- c("A","B","B","A")
weight <- c(10,20,30,40)

df <- data.frame (f2a, f2b, f2c, clustervar, weight, stringsAsFactors=FALSE)
df

what I am looking for is something like

df %>%
  group_by (clustervar) %>%
  summarise_each(funs(weighted.mean(weight)), select=cbind(clustervar, f2a:f2c))

The result of this is only:

# A tibble: 2 × 4
  clustervar select4 select5 select6
       <chr>   <dbl>   <dbl>   <dbl>
1          A      25      25      25
2          B      25      25      25

What am I missing here?

akrun · Accepted Answer

We can reshape it to 'long' format and then do this

library(tidyverse)
gather(df, Var, Val, f2a:f2c) %>% 
        group_by(clustervar, Var) %>% 
        summarise(wt =weighted.mean(Val, weight)) %>%
        spread(Var, wt)

Or another option is

df %>%
    group_by(clustervar) %>% 
    summarise_each(funs(weighted.mean(., weight)), matches("^f"))
# A tibble: 2 × 4     
#    clustervar   f2a   f2b   f2c
#         <chr> <dbl> <dbl> <dbl>
# 1          A     1   0.8     1
# 2          B     0   0.0     1

Or with summarise_at and matches (another variation of another post - didn't see the other post while posting)

df %>% 
   group_by(clustervar) %>% 
   summarise_at(vars(matches('f2')), funs(weighted.mean(., weight)))
# A tibble: 2 × 4
#   clustervar   f2a   f2b   f2c
#        <chr> <dbl> <dbl> <dbl>
#1          A     1   0.8     1
#2          B     0   0.0     1

Or another option is data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) weighted.mean(x, weight)),
                       by = clustervar, .SDcols  = f2a:f2c]
#    clustervar f2a f2b f2c
#1:          A   1 0.8   1
#2:          B   0 0.0   1

NOTE: All four answers are based on legitimate tidyverse/data.table syntax and would get the expected output

We can also create a function that makes use of the syntax from devel version of dplyr (soon to be released 0.6.0). The enquo does the similar job of substitute by taking the input arguments and converting it to quosures. Within the group_by/summarise/mutate, we evalute the quosure by unquoting (UQ or !!) it

wtFun <- function(dat, pat, wtcol, grpcol){
       wtcol <- enquo(wtcol)
       grpcol <- enquo(grpcol)
       dat %>%
           group_by(!!grpcol) %>%
           summarise_at(vars(matches(pat)), funs(weighted.mean(., !!wtcol)))
 }

wtFun(df, "f2", weight, clustervar)
# A tibble: 2 × 4
#   clustervar   f2a   f2b   f2c
#       <chr> <dbl> <dbl> <dbl>
#1          A     1   0.8     1
#2          B     0   0.0     1

weighted mean in dplyr for multiple columns

Tags:

r

dplyr

mean

weighted

Jan

1 Answers

akrun

Recent Activity

Donate For Us

weighted mean in dplyr for multiple columns

Tags:

r

dplyr

mean

weighted

Jan

1 Answers

akrun

Related questions

Recent Activity

Donate For Us