How to use dplyr to calculate a weighted mean of two grouped variables

Question

I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have:

mtcars %>% group_by(cyl, mpg) %>% summarize(Count = n())

This will generate a dataframe with 27 rows for the three variables cyl, mpg, and Count. What I'd like to do next is summarize the average mpg for each of the three cyl values. Keep in mind that each row may contain a Count greater than one which must be considered when calculating the average. My data frame should have 3 rows of 2 variables cyl, and Avg_mpg. Can someone give me the short code chuck that will do this? Thank you in advance.

Ronak Shah · Accepted Answer

If I have understood you correctly, you need weighted.mean

library(dplyr)
mtcars %>% 
   group_by(cyl, mpg) %>% 
   summarize(Count = n()) %>%
   group_by(cyl) %>%
   summarise(avg_mpg = weighted.mean(mpg, Count))

# A tibble: 3 x 2
#    cyl   avg_mpg
#  <dbl>   <dbl>
#1  4.00    26.7
#2  6.00    19.7
#3  8.00    15.1

which is equivalent to

mtcars %>% 
  group_by(cyl, mpg) %>% 
  summarize(Count = n()) %>%
  group_by(cyl) %>%
  summarise(avg_mpg = sum(mpg * Count)/sum(Count))

How to use dplyr to calculate a weighted mean of two grouped variables

Tags:

r

dplyr

summarize

weighted-average

split-apply-combine

ds_guy

Video Answer

1 Answers

Ronak Shah

Recent Activity

Donate For Us

How to use dplyr to calculate a weighted mean of two grouped variables

Tags:

r

dplyr

summarize

weighted-average

split-apply-combine

ds_guy

Video Answer

1 Answers

Ronak Shah

Related questions

Recent Activity

Donate For Us