A simple example of calculating standard dev:
d <- c(2,4,4,4,5,5,7,9)
sd(d)
yields
[1] 2.13809
but when done by hand, the answer is 2. What am I missing here?
Standard error increases when standard deviation, i.e. the variance of the population, increases. Standard error decreases when sample size increases – as the sample size gets closer to the true size of the population, the sample means cluster more and more around the true population mean.
The standard deviation is the average amount of variability in your data set. It tells you, on average, how far each score lies from the mean.
To calculate the standard deviation in r, use the sd() function. The standard deviation of an observation variable in R is calculated by the square root of its variance. The sd in R is a built-in function that accepts the input object and computes the standard deviation of the values provided in the object.
Try this
R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
[1] 2
R>
and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:
The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .
Looks like R is assuming (n-1) in the denominator, not n.
When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.
pop.var <- function(x) var(x) * (length(x)-1) / length(x)
pop.sd <- function(x) sqrt(pop.var(x))
BTW, Khan Academy has a good discussion of population and sample standard deviation here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With