I want to run svd()
in R on a large sparse matrix (17k x 2m), and I have access to a cluster. Is there a straightforward way to calculate SVD in R using multiple cores?
The RScaLAPACK package (http://www.inside-r.org/packages/cran/RScaLAPACK) would seem to make this possible, but it no longer appears to be actively supported (http://cran.r-project.org/web/packages/RScaLAPACK/) and I assume there's a reason for that.
rARPACK is the package you need. Works like a charm (even with matrix much larger than your specification). Superfast because it parallelizes via C and C++.
rARPACK
is one choice, but be sure that you have an optimized multicore BLAS library, since all the parallel computing part is not in rARPACK
itself but rather in BLAS.
Also, be careful that rARPACK
only calculates PARTIAL SVD, meaning it only calculates the largest k
singular values and associated singular vectors. If you do need the full SVD, you may still turn to use svd()
.
You may also consider bigstatsr::bigSVD(). I've tested it and it tends to be faster than my GPU when I was working on large methylation datasets. It's not parallel, but I found it's performance really is pretty remarkable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With