Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to approximate correlation matrix in large sparse scipy matrices?

For the purpose I used the solution from that thread link by now, however it gives memory error as expected since my matrix A size is 6 million to 40000 matrix. Therefore I am looking for any other solution nevertheless to approximate the correlation matrix. How can I vaccinate that problem? Any help is appreciated.

like image 591
erogol Avatar asked Nov 11 '22 17:11

erogol


1 Answers

Your problem is that you can't hold the result in memory (6e6^2 values?).

You can drop rows from the original matrix. If, for example, you are searching for highly correlated rows, you may want to cluster the rows, in order to break the problem.

You can also use scipy.sparse.linalg.svds to shrink the number of columns. But you will still have to handle rows^2 correlations.

like image 76
cyborg Avatar answered Nov 14 '22 22:11

cyborg