Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Population Variance in r

Tags:

How can I calculate the population variance of my data using R?

I read there is a package called popvar but I have the Version 0.99.892 and I don't find the package

like image 936
YazminRios Avatar asked Jun 09 '16 18:06

YazminRios


People also ask

How do you find the population variance?

The population variance is the variance of the population. To calculate the population variance, use the formula σ2=1NN∑i=1(xi−μ)2 σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 where N is the size of the population consisting of x1,x2,…

What is the variance function in R?

var() function in R Language computes the sample variance of a vector. It is the measure of how much value is away from the mean value. Syntax: var(x) Parameters: x : numeric vector.

What is population in variance?

Population variance is a measure of the spread of population data. Hence, population variance is the average of the distances from each data point in a particular population to the mean squared. It indicates how data points spread out in the population.

How do I get SD in R?

Finding the standard deviation of the values in R is easy. R offers standard function sd(' ') to find the standard deviation. You can create a list of values or import a CSV file to find the standard deviation.


2 Answers

The var() function in base R calculate the sample variance, and the population variance differs with the sample variance by a factor of n / n - 1. So an alternative to calculate population variance will be var(myVector) * (n - 1) / n where n is the length of the vector, here is an example:

x <- 1:10 var(x) * 9 /10 [1] 8.25 

From the definition of population variance:

sum((x - mean(x))^2) / 10 [1] 8.25  
like image 165
Psidom Avatar answered Oct 11 '22 23:10

Psidom


You already have a great answer, but I'd like to show that you can easily make your own convenience functions. It is surprising that a population variance/standard deviation function is not available in base R. It is available in Excel/Calc and other software. It wouldn't be difficult to have such a function. It could be named sdp or sd.p or be invoked with sd(x, pop = TRUE)

Here is a basic version of population variance with no type-checking:

  x <- 1:10   varp <- function(x) mean((x-mean(x))^2)   varp(x)   ## [1] 8.25 

To scale up, if speed is an issue, colSums and/or colMeans may be used (see: https://rdrr.io/r/base/colSums.html)

like image 28
PatrickT Avatar answered Oct 12 '22 00:10

PatrickT