Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Display weighted mean by group in the data.frame

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.

I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:

obs income education weight incomegroup
1.   1000      A       10    --> display weighted mean of income for education level A
2.   2000      B        1    --> display weighted mean of income for education level B
3.   1500      B        5    --> display weighted mean of income for education level B
4.   2000      A        2    --> display weighted mean of income for education level A

I tried:

data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))    

It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr or aggregate but it does not seem to do what I am interested in.

The ave{stats} command gives exactly what I am looking for but only for simple mean:

data$incomegroup=ave(data$income,data$education,FUN = mean)

It cannot be used with weights.

Thanking you in advance for your help!

like image 609
Elixterra Avatar asked Jul 21 '16 16:07

Elixterra


People also ask

How do you find the weighted mean of grouped data?

To find the weighted mean: Multiply the numbers in your data set by the weights. Add the results up.

How do you find the weighted mean of a survey?

To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.

How do you find the weighted mean in SPSS?

Regarding weighted means, in general, to weight cases click "Data" - "Weight cases" - select "Weight cases by" and then choose variable which contains weights. And to calculate means click "Analyze" - "Descriptive Statistics" - "Descriptives" and then select variables for analysis. Hope it helps.

How does Matlab calculate weighted mean?

weightedMeans = sum(A. *B, 1); % Get weighted means within a row going across columns.


1 Answers

Try using the dplyr package as follows:

df <- read.table(text = 'obs income education weight   
                          1   1000      A       10     
                          2   2000      B        1     
                          3   1500      B        5     
                          4   2000      A        2', 
                 header = TRUE)     

library(dplyr)

df_summary <- 
  df %>% 
  group_by(education) %>% 
  summarise(weighted_income = weighted.mean(income, weight))

df_summary
# education weighted_income
#     A        1166.667
#     B        1583.333

df_final <- left_join(df, df_summary, by = 'education')

df_final
# obs income education weight weighted_income
#  1   1000         A     10        1166.667
#  2   2000         B      1        1583.333
#  3   1500         B      5        1583.333
#  4   2000         A      2        1166.667
like image 57
Alex Ioannides Avatar answered Sep 30 '22 09:09

Alex Ioannides