Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to calculate first two principal components in R?

I am using princomp in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor.

Since I only want the first two components, is there a faster way to do this?

Update :

In addition to speed, Is there a memory efficient way to do this ?

It takes ~2 hours and ~6.3 GB of physical memory for calculating first two components using svd(,2,).

like image 578
384X21 Avatar asked Nov 28 '11 17:11

384X21


People also ask

What is PC1 and PC2?

PC1 and PC2 are proprotein convertases capable of cleaving proopiomelanocortin at distinct pairs of basic residues. S Benjannet, N Rondeau, R Day, M Chrétien, and N G Seidah. J. A. DeSève Laboratory of Biochemical Neuroendocrinology, Clinical Research Institute of Montreal, PQ, Canada.

Why do we often pick just the first two principal components?

By using only the first few of these principal components from a data set that originally had many different dimensions, you can capture most of the variance in the original dataset even while your model considers substantially fewer dimensions at one time.


1 Answers

You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen() and prcomp() do not offer this, but svd() allows you to specify the maximum number to compute.

On small matrices, the gains seem modest:

R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
R> library(rbenchmark)
R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
          test replications elapsed relative user.self sys.self user.child
2 svd(M, 2, 0)          100   0.021  1.00000      0.02        0          0
3    prcomp(M)          100   0.043  2.04762      0.04        0          0
1     eigen(M)          100   0.050  2.38095      0.05        0          0
4  princomp(M)          100   0.065  3.09524      0.06        0          0
R> 

but the factor of three relative to princomp() may be worth your while reconstructing princomp() from svd() as svd() allows you to stop after two values.

like image 71
Dirk Eddelbuettel Avatar answered Sep 28 '22 10:09

Dirk Eddelbuettel