Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do the results of mad(x) differ from the expected results?

I am trying to calculate the mean average deviation of a sample ("S") of numbers. The results I get when using the "mad()" function and when making the mean average deviation calculations one step at a time are different. Why?

 s<- c(100,110,114,121,130,130,160)

Using the "mad()" function, I get:

> mad(s)
[1] 13.3434

When breaking down the formula and doing the same operation one step at a time, I get:

> sum(abs(s-mean(s)))/length(s)
[1] 14.08163

Why do these results differ?

Am I making an error when entering my formula? (This would not be surprising - I am just starting to learn R). What is wrong with my formula?

Or is the formula that R uses to calculate the mean average deviation different from the following (given on Wikipedia)

MAD = (sum of (absolute values of (each value minus average value for sample))) divided by (the number of values in the sample)?

(Thank you for your help!)

like image 494
Larix.laricina Avatar asked Jun 28 '15 02:06

Larix.laricina


1 Answers

"MAD" is unfortunately a term with multiple meanings; mean absolute deviation from the mean (sometimes just called the MD or mean deviation), median absolute deviation from the median, mean absolute deviation from the median (which arises when computing scale in a Laplace), etc. Wikipedia -- while often useful -- is not the arbiter of usage; it can sometimes be a little idiosyncratic in its use of terms (that's not particularly a criticism of Wikipedia; it's partly inherent in the nature of the thing). [Personally in the absence of further clues I'd usually interpret MAD as median absolute deviation from the median, and expect mean absolute deviation from the mean if not written in full to be written either as "mean deviation"/"MD" or "mean absolute deviation".]

The question of which R is computing is resolved by the simple expedient of ?mad:

 mad {stats}    R Documentation

 Median Absolute Deviation

 Description

 Compute the median absolute deviation, i.e., the (lo-/hi-) median of the 
 absolute deviations from the median, and (by default) adjust by a factor 
 for asymptotically normal consistency.

Just as a general suggestion, when using a function for the first time, don't assume you know what it's doing. For example, before I read the help for MAD for the first time, I wouldn't have expected it to multiply by that constant as default. (I think that's a bad idea, since that means by default it doesn't actually compute anything called MAD, but instead a robust estimate of σ for a population where the uncontaminated part is Gaussian -- but that's how it works.)

Most functions will do what you think they do, but a few may surprise you. Check the definitions in the help, look at how the inputs and outputs are defined, and try the examples.

Incidentally if you want median (absolute) deviation from the mean, you could get that by mad(x,mean(x),1). But if you want mean deviation from the mean, I don't know if there's anything simpler to write than mean(abs(x-mean(x))); it has at least the advantage of being utterly explicit.

like image 82
Glen_b Avatar answered Nov 24 '22 06:11

Glen_b