For the purpose I used the solution from that thread link by now, however it gives memory error as expected since my matrix A size is 6 million to 40000 matrix. Therefore I am looking for any other solution nevertheless to approximate the correlation matrix. How can I vaccinate that problem? Any help is appreciated.
Your problem is that you can't hold the result in memory (6e6^2 values?).
You can drop rows from the original matrix. If, for example, you are searching for highly correlated rows, you may want to cluster the rows, in order to break the problem.
You can also use scipy.sparse.linalg.svds
to shrink the number of columns. But you will still have to handle rows^2 correlations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With