Looking for a way to calculate Population Standard Deviation in R -- using greater than 10 samples. Unable to extract the source C code in R to find the method of calculation.
# Sample Standard Deviation
# Note: All the below match with 10 or less samples
n <- 10 # 10 or greater it shifts calculation
set.seed(1)
x <- rnorm(n, 10)
# Sample Standard Deviation
sd(x)
# [1] 0.780586
sqrt(sum((x - mean(x))^2)/(n - 1))
# [1] 0.780586
sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n - 1)) # # Would like the Population Standard Deviation equivalent using this.
# [1] 0.780586
sqrt( (n/(n-1)) * ( ( (sum(x^2)/(n)) ) - (sum(x)/n) ^2 ) )
# [1] 0.780586
Now, the Population Standard Deviation needs to match sd(x) with 100 count.
# Population Standard Deviation
n <- 100
set.seed(1)
x <- rnorm(x, 10)
sd(x)
# [1] 0.780586
sqrt(sum((x - mean(x))^2)/(n))
# [1] 0.2341758
sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n))
# [1] 0.2341758
# Got this to work above using (eventual goal, to fix the below):
# https://en.wikipedia.org/wiki/Algebraic_formula_for_the_variance
sqrt( (n/(n-1)) * ( ( (sum(x^2)/(n)) ) - (sum(x)/n) ^2 ) ) # Would like the Population Standard Deviation equivalent using this.
# [1] 3.064027
Please check the question. The first argument of rnorm
should be n.
The population and sample standard deviations are:
sqrt((n-1)/n) * sd(x) # pop
## [1] 0.8936971
sd(x) # sample
## [1] 0.8981994
They can also be calculated like this:
library(sqldf)
library(RH2)
sqldf("select stddev_pop(x), stddev_samp(x) from X")
## STDDEV_POP("x") STDDEV_SAMP("x")
## 1 0.8936971 0.8981994
Note: We used this test data:
set.seed(1)
n <- 100
x <- rnorm(n)
X <- data.frame(x)
I think that the easiest way is to just define it quickly from sd
:
sd.p=function(x){sd(x)*sqrt((length(x)-1)/length(x))}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With