Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between mean and manual calculation in R?

Tags:

math

r

I am writing a simple function in R to calculate percentage differences between two input numbers.

pdiff <-function(a,b) 
    {
      if(length(a>=1)) a <- median(a)
      if(length(b>=1)) b <- median(b)
      (abs(a-b)/((a+b)/2))*100
    }

    pdiffa <-function(a,b)
    {
      if(length(a>=1)) a <- median(a)
      if(length(b>=1)) b <- median(b)
      (abs(a-b)/mean(a,b))*100
    }

When you run it with a random value of a and b, the functions give different results

x <- 5
y <- 10
pdiff(x,y) #gives 66%
pdiffa(x,y) #gives 100%

enter image description here

When I go into the code, apparently the values of (x+y)/2 = 7.5 and mean(x,y) = 5 differ......Am I missing something really obvious and stupid here?

enter image description here

like image 855
Rover Eye Avatar asked Mar 28 '17 22:03

Rover Eye


2 Answers

This is due to a nasty "gotcha" in the mean() function (not listed in the list of R traps, but probably should be): you want mean(c(a,b)), not mean(a,b). From ?mean:

mean(x, ...)
[snip snip snip]
... further arguments passed to or from other methods.

So what happens if you call mean(5,10)? mean calls the mean.default method, which has trim as its second argument:

trim the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

The last phrase "values of trim outside that range are taken as the nearest endpoint" means that values of trim larger than 0.5 are set to 0.5, which means that we're asking mean to throw away 50% of the data on either end of the data set, which means that all that's left is the median. Debugging our way through mean.default, we see that we indeed end up at this code ...

if (trim >= 0.5) 
      return(stats::median(x, na.rm = FALSE))

So mean(c(x,<value_greater_than_0.5>)) returns the median of c(5), which is just 5 ...

like image 90
Ben Bolker Avatar answered Nov 08 '22 19:11

Ben Bolker


Try mean(5, 10) by itself.

mean(5, 10)
[1] 5

Now try mean(c(5, 10)).

mean(c(5, 10))
[1] 7.5

mean takes a vector as its first argument.

like image 34
neilfws Avatar answered Nov 08 '22 20:11

neilfws