Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing PCA in java on large matrix

I have a very large matrix (about 500000 * 20000) containing the data that I would analyze with pca. To do this I'm using ParallelColt library, but both using singular value decomposition and eigenvalues decomposition in order to get the eigenvectors and eigenvalues of the covariance matrix. But these methods waste the heap and I get "OutOfMemory" errors...

Also using SparseDoubleMatrix2D (the data are very sparse) the errors still remain, so I ask you : how can I solve this problem ?

Change library ?

like image 931
dacanalr Avatar asked Nov 04 '22 10:11

dacanalr


1 Answers

You can compute PCA with Oja's rule : it's an iterative algorithm, improving an estimate of the PCA, one vector a time. It's slower than the usual PCA, but requires you to store only one vector in memory. It's also very numerically stable

http://en.wikipedia.org/wiki/Oja%27s_rule

like image 179
Monkey Avatar answered Nov 15 '22 01:11

Monkey