Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to redefine cov to calculate population covariance matrix

Tags:

r

covariance

The standard cov function calculates the sample covariance matrix, I want to have the population covariance matrix.

I tried the following:

cov.pop <- function(x,y=NULL) {
  cov(x,y)*(length(x)-1)/length(x)
}

> sapply(list(Apple,HP,Microsoft),cov.pop,y=Apple) #correct
[1] 0.7861672 0.1363396 0.2223303
> sapply(list(Apple,HP,Microsoft),cov.pop,y=HP) #correct
[1] 0.13633964 0.09560376 0.05226032
> sapply(list(Apple,HP,Microsoft),cov.pop,y=Microsoft) #correct
[1] 0.22233028 0.05226032 0.13519964
> cov.pop(cbind(Apple,HP,Microsoft)) #not correct
              Apple         HP  Microsoft
Apple     0.8444018 0.14643887 0.23879919
HP        0.1464389 0.10268552 0.05613145
Microsoft 0.2387992 0.05613145 0.14521443

My question
Is there a simple way to modify the cov.pop function to get the correct population covariance matrix?

like image 332
vonjd Avatar asked Jun 07 '15 16:06

vonjd


People also ask

How do you calculate population covariance?

The population covariance between and is obtained by summing over all pairs of variables. We then multiply respective coefficients from the two linear combinations as times times the covariances between j and k. We can then estimate the population covariance by using the sample covariance.

What is the difference between sample covariance and population covariance?

The only difference in formula for Population Covariance and Sample Covariance lies in the fact that Population Covariance is calculated over the entire dataset(N) whereas Sample Covariance is calculated over a sample (N-1), so that the denominator of the Population Covariance is 1 larger than that of the Sample ...

How do you make a variance-covariance matrix in R?

To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.


1 Answers

I guess the results are different because the length in the matrix (i.e. cbind(Apple, HP, Microsoft) and the length in each list element is not the same

cov.pop <- function(x,y=NULL) {
   cov(x,y)*(NROW(x)-1)/NROW(x)
  }

Using an example dataset

set.seed(24)
Apple <- rnorm(140)
HP <- rnorm(140)
Microsoft <- rnorm(140)

cov.pop(cbind(Apple,HP,Microsoft)) 
#                Apple          HP  Microsoft
#Apple     0.946489639 0.006511604 0.02518080
#HP        0.006511604 1.015532869 0.04940075
#Microsoft 0.025180805 0.049400745 1.08388185

sapply(list(Apple,HP,Microsoft),cov.pop,y=Apple)
#[1] 0.946489639 0.006511604 0.025180805

sapply(list(Apple,HP,Microsoft),cov.pop,y=HP)
#[1] 0.006511604 1.015532869 0.049400745

sapply(list(Apple,HP,Microsoft),cov.pop,y=Microsoft)
#[1] 0.02518080 0.04940075 1.08388185
like image 91
akrun Avatar answered Nov 13 '22 16:11

akrun