A simple example of calculating standard dev: <pre class="prettyprint"><code>d <- c(2,4,4,4,5,5,7,9) sd(d) </code></pre> yields <pre class="prettyprint"><code>[1] 2.13809 </code></pre> but when done by hand, the answer is 2. What am I missing here?

Try this <pre class="prettyprint"><code>R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8) [1] 2 R> </code></pre> and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote: <blockquote> The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, . </blockquote>

When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions. <pre class="prettyprint"><code> pop.var <- function(x) var(x) * (length(x)-1) / length(x) pop.sd <- function(x) sqrt(pop.var(x)) </code></pre> BTW, Khan Academy has a good discussion of population and sample standard deviation here.

Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?

Tags:

r

statistics

standard-deviation

A simple example of calculating standard dev:

d <- c(2,4,4,4,5,5,7,9)
sd(d)

yields

[1] 2.13809

but when done by hand, the answer is 2. What am I missing here?

200

asked Jun 23 '11 16:06

Travis Rodman

3 Answers

Try this

R> sd(c(2,4,4,4,5,5,7,9)) * sqrt(7/8)
[1] 2
R>

and see the rest of the Wikipedia article for the discussion about estimation of standard deviations. Using the formula employed 'by hand' leads to a biased estimate, hence the correction of sqrt((N-1)/N). Here is a key quote:

The term standard deviation of the sample is used for the uncorrected estimator (using N) while the term sample standard deviation is used for the corrected estimator (using N − 1). The denominator N − 1 is the number of degrees of freedom in the vector of residuals, .

answered Oct 05 '22 05:10

Dirk Eddelbuettel

Looks like R is assuming (n-1) in the denominator, not n.

answered Oct 04 '22 05:10

duffymo

When I want the population variance or standard deviation (n as denominator), I define these two vectorized functions.

  pop.var <- function(x) var(x) * (length(x)-1) / length(x)

  pop.sd <- function(x) sqrt(pop.var(x))

BTW, Khan Academy has a good discussion of population and sample standard deviation here.

answered Oct 03 '22 05:10

Ken Lin

Related questions
                            
                                How to increase size of the points in ggplot2, similar to cex in base plots?
                            
                                Multiple ROC curves in one plot ROCR
                            
                                Get the min of two columns
                            
                                Changing tick intervals when x axis values are dates
                            
                                Index value for matrix in R?
                            
                                how to create a list in R from two vectors (one would be the keys, the other the values)?
                            
                                How to loop through a list in R
                            
                                Difference between r-base and r-recommended packages
                            
                                How can I generate a GUID in R?
                            
                                Reduce size of legend area in barplot
                            
                                R and Leaflet: How to arrange label text across multiple lines
                            
                                Ensuring reproducibility in an R environment
                            
                                Suppress ticks in plot in r
                            
                                Speed up plot() function for large dataset
                            
                                How to get reverse of a TRUE/FALSE vector?
                            
                                R glmnet as.matrix() error message
                            
                                writing a matrix to a file, without a header and row numbers
                            
                                calculating time difference in R
                            
                                How to use Dplyr's Summarize and which() to lookup min/max values
                            
                                How to upgrade R in linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With