I am writing a simple function in R to calculate percentage differences between two input numbers.
pdiff <-function(a,b)
{
if(length(a>=1)) a <- median(a)
if(length(b>=1)) b <- median(b)
(abs(a-b)/((a+b)/2))*100
}
pdiffa <-function(a,b)
{
if(length(a>=1)) a <- median(a)
if(length(b>=1)) b <- median(b)
(abs(a-b)/mean(a,b))*100
}
When you run it with a random value of a and b, the functions give different results
x <- 5
y <- 10
pdiff(x,y) #gives 66%
pdiffa(x,y) #gives 100%
When I go into the code, apparently the values of (x+y)/2 = 7.5 and mean(x,y) = 5 differ......Am I missing something really obvious and stupid here?
This is due to a nasty "gotcha" in the mean()
function (not listed in the list of R traps, but probably should be): you want mean(c(a,b))
, not mean(a,b)
. From ?mean
:
mean(x, ...)
[snip snip snip]...
further arguments passed to or from other methods.
So what happens if you call mean(5,10)
? mean
calls the mean.default
method, which has trim
as its second argument:
trim
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
The last phrase "values of trim outside that range are taken as the nearest endpoint" means that values of trim
larger than 0.5 are set to 0.5, which means that we're asking mean
to throw away 50% of the data on either end of the data set, which means that all that's left is the median. Debugging our way through mean.default
, we see that we indeed end up at this code ...
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
So mean(c(x,<value_greater_than_0.5>))
returns the median of c(5)
, which is just 5 ...
Try mean(5, 10)
by itself.
mean(5, 10)
[1] 5
Now try mean(c(5, 10))
.
mean(c(5, 10))
[1] 7.5
mean
takes a vector as its first argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With