Issues regarding the command by
and weighted.mean
already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming.
I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this:
obs income education weight incomegroup
1. 1000 A 10 --> display weighted mean of income for education level A
2. 2000 B 1 --> display weighted mean of income for education level B
3. 1500 B 5 --> display weighted mean of income for education level B
4. 2000 A 2 --> display weighted mean of income for education level A
I tried:
data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))
It does not work. The weighted mean is calculated somehow and appears in column "incomegroup" but for the whole set instead of by group or for one group only, I don't know. I read things regarding packages plyr
or aggregate
but it does not seem to do what I am interested in.
The ave{stats}
command gives exactly what I am looking for but only for simple mean:
data$incomegroup=ave(data$income,data$education,FUN = mean)
It cannot be used with weights.
Thanking you in advance for your help!
To find the weighted mean: Multiply the numbers in your data set by the weights. Add the results up.
To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.
Regarding weighted means, in general, to weight cases click "Data" - "Weight cases" - select "Weight cases by" and then choose variable which contains weights. And to calculate means click "Analyze" - "Descriptive Statistics" - "Descriptives" and then select variables for analysis. Hope it helps.
weightedMeans = sum(A. *B, 1); % Get weighted means within a row going across columns.
Try using the dplyr package as follows:
df <- read.table(text = 'obs income education weight
1 1000 A 10
2 2000 B 1
3 1500 B 5
4 2000 A 2',
header = TRUE)
library(dplyr)
df_summary <-
df %>%
group_by(education) %>%
summarise(weighted_income = weighted.mean(income, weight))
df_summary
# education weighted_income
# A 1166.667
# B 1583.333
df_final <- left_join(df, df_summary, by = 'education')
df_final
# obs income education weight weighted_income
# 1 1000 A 10 1166.667
# 2 2000 B 1 1583.333
# 3 1500 B 5 1583.333
# 4 2000 A 2 1166.667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With