a=c(1,2,NA,4)
b=c(10,NA,30,40)
weighted.mean(a,b,na.rm = T)
The above code gives me NA
as the answer, I think na.rm
only ignores the NA values in vector a and not b. How can I ignore the NA
in vector b or weights to be specific. I just cannot change the NA to 0, I know that would do the trick but looking for a tweak in the formula itself.
Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1.
Weighted average is the average of a set of numbers, each with different associated “weights” or values. To find a weighted average, multiply each number by its weight, then add the results.
For a weighted median we change how the middle is found; instead of finding the middle value we are looking for the middle weight and then the median is the associated value for that weight. Here's a very high-level algorithm: Sort the values. Add up the weights for the values in order (i.e. a running sum of weight).
In calculating a simple average, or arithmetic mean, all numbers are treated equally and assigned equal weight. But a weighted average assigns weights that determine in advance the relative importance of each data point. A weighted average is most often computed to equalize the frequency of the values in a data set.
This is the function I ended up writing to solve this problem:
weighted_mean <- function(x, w, ..., na.rm = FALSE){
if(na.rm){
df_omit <- na.omit(data.frame(x, w))
return(weighted.mean(df_omit$x, df_omit$w, ...))
}
weighted.mean(x, w, ...)
}
I adapted Mhairi's code to not use data.frame nor na.omit:
weighted_mean = function(x, w, ..., na.rm=F){
if(na.rm){
keep = !is.na(x)&!is.na(w)
w = w[keep]
x = x[keep]
}
weighted.mean(x, w, ..., na.rm=F)
}
It's really surprising that R builtin weighted.mean na.rm=T doesn't handle NA weights. Just wasted a few hours discovering that.
I made a simple modification to the weight w
in weighted.mean
by coalesce
as follows:
a = c(1,2,NA,4)
b = c(10,NA,30,40)
weighted.mean(a, dplyr::coalesce(b,0), na.rm = T)
The idea is I replaced missing weights by zeros, so it fix the error. It returns the result as 3.4, :)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With