Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LARGE covariance matrix in R

Tags:

r

covariance

from gene expression data (40000 genes (variables) x 30 observation) I want to create a 40000 x 40000 covariance matrix. This definitely is larger than my RAM. With package 'ff' I managed to preallocate a 40000x40000 empty matrix for the correlations. However the 'cov' or 'cor' function will manage only a 5000x5000 covariance matrix on my system, so I have to do blockwise 1:5000, 5001:10000 etc covariance calculations and fill the preallocated matrix along the diagonal. Does anybody know of an algorithm to fill the "missing patches" in the matrix, i.e. covariance (or correlation between) 1 and 22000. I know I can do all pairwise combinations and fill in the matrix one-by-one, but 'cor' is quite fast... So, is there a way to calculate cov (or cor) of 1/22000 by using the already calculated covariances?

Thanks in advance!

like image 506
anspiess Avatar asked Feb 16 '13 14:02

anspiess


People also ask

Why is my covariance so high?

Covariance gives you a positive number if the variables are positively related. You'll get a negative number if they are negatively related. A high covariance basically indicates there is a strong relationship between the variables. A low value means there is a weak relationship.

How do you do a covariance matrix in R?

To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.

Can you have a covariance greater than 1?

Unlike correlation, covariance values do not have a limit between -1 and 1. Therefore, it may be wrong to conclude that there might be a high relationship between variables when the covariance is high. The size of covariance values depends on the difference between values in variables.

Is high covariance good?

A high covariance shows a strong relationship between two variables, whereas a low covariance shows a weak relationship.


1 Answers

You can use cov with 2 arguments to compute the off-diagonal blocks.

cov( x[,1:5000], x[,5001:10000] )
like image 69
Vincent Zoonekynd Avatar answered Sep 22 '22 19:09

Vincent Zoonekynd