I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have:
mtcars %>% group_by(cyl, mpg) %>% summarize(Count = n())
This will generate a dataframe with 27 rows for the three variables cyl
, mpg
, and Count
. What I'd like to do next is summarize the average mpg
for each of the three cyl
values. Keep in mind that each row may contain a Count
greater than one which must be considered when calculating the average. My data frame should have 3 rows of 2 variables cyl
, and Avg_mpg
. Can someone give me the short code chuck that will do this?
Thank you in advance.
If I have understood you correctly, you need weighted.mean
library(dplyr)
mtcars %>%
group_by(cyl, mpg) %>%
summarize(Count = n()) %>%
group_by(cyl) %>%
summarise(avg_mpg = weighted.mean(mpg, Count))
# A tibble: 3 x 2
# cyl avg_mpg
# <dbl> <dbl>
#1 4.00 26.7
#2 6.00 19.7
#3 8.00 15.1
which is equivalent to
mtcars %>%
group_by(cyl, mpg) %>%
summarize(Count = n()) %>%
group_by(cyl) %>%
summarise(avg_mpg = sum(mpg * Count)/sum(Count))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With