randomized SVD decomposes a matrix by extracting the first k singular values/vectors using k+p random projections. this works surprisingly well for large matrices.
my question concerns the singular values that are output from the algorithm. why aren't the values equal to the first k-singular values if you do the full SVD?
Below I have a simple implementation in R. Any suggestions on improving the performance would be appreciated.
rsvd = function(A, k=10, p=5) {
n = nrow(A)
y = A %*% matrix(rnorm(n * (k+p)), nrow=n)
q = qr.Q(qr(y))
b = t(q) %*% A
svd = svd(b)
list(u=q %*% svd$u, d=svd$d, v=svd$v)
}
> set.seed(10)
> A <- matrix(rnorm(500*500),500,500)
> svd(A)$d[1:15]
[1] 44.94307 44.48235 43.78984 43.44626 43.27146 43.15066 42.79720 42.54440 42.27439 42.21873 41.79763 41.51349 41.48338 41.35024 41.18068
> rsvd.o(A,10,5)$d
[1] 34.83741 33.83411 33.09522 32.65761 32.34326 31.80868 31.38253 30.96395 30.79063 30.34387 30.04538 29.56061 29.24128 29.12612 27.61804
I reckon that your algorithm is a modification of the algorithm of Martinsson et al.. If I understood it correctly, this is especially meant for approximations for low rank matrices. I might be wrong though.
The difference is easily explained by the huge difference between the actual rank of A (500) and the values of k (10) and p (5). Plus, Martinsson et al mention that the value for p should actually be larger than the chosen value for k.
So if we apply your solution taking their considerations into account, using :
set.seed(10)
A <- matrix(rnorm(500*500),500,500) # rank 500
B <- matrix(rnorm(500*50),500,500) # rank 50
We find for the timings that the use of a larger p value still results in a huge speed-up compared to the original svd algorithm.
> system.time(t1 <- svd(A)$d[1:5])
user system elapsed
0.8 0.0 0.8
> system.time(t2 <- rsvd(A,10,5)$d[1:5])
user system elapsed
0.01 0.00 0.02
> system.time(t3 <- rsvd(A,10,30)$d[1:5])
user system elapsed
0.04 0.00 0.03
> system.time(t4 <- svd(B)$d[1:5] )
user system elapsed
0.55 0.00 0.55
> system.time(t5 <-rsvd(B,10,5)$d[1:5] )
user system elapsed
0.02 0.00 0.02
> system.time(t6 <-rsvd(B,10,30)$d[1:5] )
user system elapsed
0.05 0.00 0.05
> system.time(t7 <-rsvd(B,25,30)$d[1:5] )
user system elapsed
0.06 0.00 0.06
But we see that using a higher p for a lower rank matrix indeed gives a better approximation. If we let k also approach the rank a bit closer, the difference between the real solution and the approximation becomes appx. 0, while the speed gain is still substantial.
> round(mean(t2/t1),2)
[1] 0.77
> round(mean(t3/t1),2)
[1] 0.82
> round(mean(t5/t4),2)
[1] 0.92
> round(mean(t6/t4),2)
[1] 0.97
> round(mean(t7/t4),2)
[1] 1
So in general I believe that one could conclude that :
l
if I'm right)As far as I'm concerned, it's a neat way of doing it. I couldn't really find a more optimal way actually. The only thing I could say is that the construct t(q) %*% A
is advised against. One should use crossprod(q,A)
for that, which is supposed to be a tiny bit faster. But in your example the difference was nonexistent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With