Display weighted mean by group in the data.frame

Tags:

weighted-average

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.

I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:

obs income education weight incomegroup
1.   1000      A       10    --> display weighted mean of income for education level A
2.   2000      B        1    --> display weighted mean of income for education level B
3.   1500      B        5    --> display weighted mean of income for education level B
4.   2000      A        2    --> display weighted mean of income for education level A

I tried:

data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))

It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr or aggregate but it does not seem to do what I am interested in.

The ave{stats} command gives exactly what I am looking for but only for simple mean:

data$incomegroup=ave(data$income,data$education,FUN = mean)

It cannot be used with weights.

Thanking you in advance for your help!

609

asked Jul 21 '16 16:07

Elixterra

1 Answers

Try using the dplyr package as follows:

df <- read.table(text = 'obs income education weight   
                          1   1000      A       10     
                          2   2000      B        1     
                          3   1500      B        5     
                          4   2000      A        2', 
                 header = TRUE)     

library(dplyr)

df_summary <- 
  df %>% 
  group_by(education) %>% 
  summarise(weighted_income = weighted.mean(income, weight))

df_summary
# education weighted_income
#     A        1166.667
#     B        1583.333

df_final <- left_join(df, df_summary, by = 'education')

df_final
# obs income education weight weighted_income
#  1   1000         A     10        1166.667
#  2   2000         B      1        1583.333
#  3   1500         B      5        1583.333
#  4   2000         A      2        1166.667

answered Sep 30 '22 09:09

Alex Ioannides

Related questions
                            
                                merging a large list of xts objects
                            
                                How to get column mean for specific rows only?
                            
                                conditional inclusion of arguments in a function call
                            
                                X axis in Barplot in R
                            
                                How to perform RMSE with missing values?
                            
                                Shiny not displaying my ggplot as I'd expect
                            
                                draw cell borders using heatmap.2
                            
                                Making gsub only replace entire words?
                            
                                Finding the maximum absolute value whilst preserving + or - symbol
                            
                                Is it possible to define operator without %%?
                            
                                raster package taking all hard drive
                            
                                Print a matrix without row and column indices
                            
                                Plotting the poisson distribution using ggplot2's stat_function
                            
                                Transform Correlation Matrix into dataframe with records for each row column pair
                            
                                Add column to table with data from another table
                            
                                R: make symmetric matrix from lower diagonal [duplicate]
                            
                                Line connecting the points in the plot function in R [duplicate]
                            
                                dplyr - right join after group_by not producing desired/expected result
                            
                                Adjust title vertically to inside the plot - vjust not working
                            
                                How to place an image in an R Shiny title

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With