from gene expression data (40000 genes (variables) x 30 observation) I want to create a 40000 x 40000 covariance matrix. This definitely is larger than my RAM. With package 'ff' I managed to preallocate a 40000x40000 empty matrix for the correlations. However the 'cov' or 'cor' function will manage only a 5000x5000 covariance matrix on my system, so I have to do blockwise 1:5000, 5001:10000 etc covariance calculations and fill the preallocated matrix along the diagonal. Does anybody know of an algorithm to fill the "missing patches" in the matrix, i.e. covariance (or correlation between) 1 and 22000. I know I can do all pairwise combinations and fill in the matrix one-by-one, but 'cor' is quite fast... So, is there a way to calculate cov (or cor) of 1/22000 by using the already calculated covariances?
Thanks in advance!
Covariance gives you a positive number if the variables are positively related. You'll get a negative number if they are negatively related. A high covariance basically indicates there is a strong relationship between the variables. A low value means there is a weak relationship.
To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.
Unlike correlation, covariance values do not have a limit between -1 and 1. Therefore, it may be wrong to conclude that there might be a high relationship between variables when the covariance is high. The size of covariance values depends on the difference between values in variables.
A high covariance shows a strong relationship between two variables, whereas a low covariance shows a weak relationship.
You can use cov
with 2 arguments to compute the off-diagonal blocks.
cov( x[,1:5000], x[,5001:10000] )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With