Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted Average in R using NA weights

Tags:

r

na

mean

 a=c(1,2,NA,4)
 b=c(10,NA,30,40)
 weighted.mean(a,b,na.rm = T)

The above code gives me NA as the answer, I think na.rm only ignores the NA values in vector a and not b. How can I ignore the NA in vector b or weights to be specific. I just cannot change the NA to 0, I know that would do the trick but looking for a tweak in the formula itself.

like image 649
Jain Avatar asked Oct 26 '16 17:10

Jain


People also ask

How do you calculate weighted average in R?

Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1.

How do you find the average of different weights?

Weighted average is the average of a set of numbers, each with different associated “weights” or values. To find a weighted average, multiply each number by its weight, then add the results.

Can you calculate a weighted median?

For a weighted median we change how the middle is found; instead of finding the middle value we are looking for the middle weight and then the median is the associated value for that weight. Here's a very high-level algorithm: Sort the values. Add up the weights for the values in order (i.e. a running sum of weight).

Is mean and weighted mean the same?

In calculating a simple average, or arithmetic mean, all numbers are treated equally and assigned equal weight. But a weighted average assigns weights that determine in advance the relative importance of each data point. A weighted average is most often computed to equalize the frequency of the values in a data set.


3 Answers

This is the function I ended up writing to solve this problem:

weighted_mean <- function(x, w, ..., na.rm = FALSE){

  if(na.rm){

    df_omit <- na.omit(data.frame(x, w))

    return(weighted.mean(df_omit$x, df_omit$w, ...))

  } 

  weighted.mean(x, w, ...)
}
like image 108
Mhairi McNeill Avatar answered Oct 14 '22 03:10

Mhairi McNeill


I adapted Mhairi's code to not use data.frame nor na.omit:

weighted_mean = function(x, w, ..., na.rm=F){
  if(na.rm){
    keep = !is.na(x)&!is.na(w)
    w = w[keep]
    x = x[keep]
  }
  weighted.mean(x, w, ..., na.rm=F)
}

It's really surprising that R builtin weighted.mean na.rm=T doesn't handle NA weights. Just wasted a few hours discovering that.

like image 4
webb Avatar answered Oct 14 '22 05:10

webb


I made a simple modification to the weight w in weighted.mean by coalesce as follows:

a = c(1,2,NA,4)
b = c(10,NA,30,40)
weighted.mean(a, dplyr::coalesce(b,0), na.rm = T)

The idea is I replaced missing weights by zeros, so it fix the error. It returns the result as 3.4, :)).

like image 1
Dien Giau Bui Avatar answered Oct 14 '22 03:10

Dien Giau Bui