Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

A simple example of calculating standard dev:

d <- c(2,4,4,4,5,5,7,9)
sd(d)

yields

[1] 2.13809

but when done by hand, the answer is 2. What am I missing here?

like image 200
Travis Rodman Avatar asked Jun 23 '11 16:06

Travis Rodman


People also ask

Why do I keep getting standard deviation wrong?

Standard error increases when standard deviation, i.e. the variance of the population, increases. Standard error decreases when sample size increases – as the sample size gets closer to the true size of the population, the sample means cluster more and more around the true population mean.

What does standard deviation tell us in R?

The standard deviation is the average amount of variability in your data set. It tells you, on average, how far each score lies from the mean.

Which R expression returns standard deviation?

To calculate the standard deviation in r, use the sd() function. The standard deviation of an observation variable in R is calculated by the square root of its variance. The sd in R is a built-in function that accepts the input object and computes the standard deviation of the values provided in the object.


3 Answers

Try this

R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
[1] 2
R> 

and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:

The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .

like image 79
Dirk Eddelbuettel Avatar answered Oct 05 '22 05:10

Dirk Eddelbuettel


Looks like R is assuming (n-1) in the denominator, not n.

like image 42
duffymo Avatar answered Oct 04 '22 05:10

duffymo


When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.

  pop.var <- function(x) var(x) * (length(x)-1) / length(x)

  pop.sd <- function(x) sqrt(pop.var(x))

BTW, Khan Academy has a good discussion of population and sample standard deviation here.

like image 24
Ken Lin Avatar answered Oct 03 '22 05:10

Ken Lin