Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating SVD using multiple cores in R

I want to run svd() in R on a large sparse matrix (17k x 2m), and I have access to a cluster. Is there a straightforward way to calculate SVD in R using multiple cores?

The RScaLAPACK package (http://www.inside-r.org/packages/cran/RScaLAPACK) would seem to make this possible, but it no longer appears to be actively supported (http://cran.r-project.org/web/packages/RScaLAPACK/) and I assume there's a reason for that.

like image 557
Christopher O'Brien Avatar asked May 09 '13 05:05

Christopher O'Brien


3 Answers

rARPACK is the package you need. Works like a charm (even with matrix much larger than your specification). Superfast because it parallelizes via C and C++.

like image 74
Praveen Kumar Avatar answered Nov 18 '22 09:11

Praveen Kumar


rARPACK is one choice, but be sure that you have an optimized multicore BLAS library, since all the parallel computing part is not in rARPACK itself but rather in BLAS.

Also, be careful that rARPACK only calculates PARTIAL SVD, meaning it only calculates the largest k singular values and associated singular vectors. If you do need the full SVD, you may still turn to use svd().

like image 2
yixuan Avatar answered Nov 18 '22 08:11

yixuan


You may also consider bigstatsr::bigSVD(). I've tested it and it tends to be faster than my GPU when I was working on large methylation datasets. It's not parallel, but I found it's performance really is pretty remarkable.

like image 1
James Dalgleish Avatar answered Nov 18 '22 10:11

James Dalgleish