I have the following dataset (simple version of my actual data), 'data', and would like to calculate weighted means for variables x1 and x2, using weightings w1 and w2 respectively, split up into two groups (groups determined by the variable n).
data <- data.frame(n = c(1,1,1,2,2,2), x1 = c(4,5,4,7,5,5), x2 = c(7,10,9,NaN,11,12), w1 = c(0,1,1,1,1,1), w2 = c(1,1,1,0,0,1))
I'm trying to do it using with() but get an error when I run this:
with(data, aggregate(x = list(x1=x1, x2=x2), by = list(n = n), FUN = weighted.mean, w = list(w1 = w1,w2 = w2)))
On the otherhand, if weights aren't specified it works, but in this case default level weights are used (i.e. same as using FUN=mean).
with(data, aggregate(x = list(x1=x1, x2=x2), by = list(n = n), FUN = weighted.mean))
This question is similar to weighted means by group and column, except that my question includes different weightings for different columns. I tried using a data table but it runs into the same weighting errors as with(). Thanks in advance for any help.
The weighted mean is calculated by multiplying the weight with the quantitative outcome and adding all the products. If all the weights are equal, then the weighted mean and arithmetic mean will be the same.
Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1. In base R, we have a function weighted.
To find a weighted average, multiply each number by its weight, then add the results. If the weights don't add up to one, find the sum of all the variables multiplied by their weight, then divide by the sum of the weights.
The way to figure this out is to multiply each score by its weight (percentage) and add the products together, then divide by the sum of the weights. These scores are the student's weighted average. In a single set of test scores, each score, or quantity, is equally valuable.
Try
library(data.table)
setDT(data)[, .(x1=weighted.mean(x1, w1), x2=weighted.mean(x2, w2)) , by = n]
Or as @thelatemail commented, we can use Map
to loop over "x's", corresponding "w's" columns and call with a single weighted.mean
setDT(data)[, Map(weighted.mean, list(x1,x2), list(w1,w2)), by = n]
If there are many "x" and "w" columns, we can use grep
to get the column names, mget
to return the values inside the Map
setDT(data)[, Map(weighted.mean, mget(grep('x', names(data),
value=TRUE)), mget(grep('w', names(data), value=TRUE))), by = n]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With