Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is x \ y so much slower than (x' * x) \ (x' * y)?

For an NxP matrix x and an Nx1 vector y with N > P, the two expressions

x \ y                                       -- (1)

and

(x' * x) \ (x' * y)                         -- (2)

both compute the solution b to the matrix equation

x * b = y

in the least squares sense, i.e. so that the quantity

norm(y - x * b)

is minimized. Expression (2) does it using the classic algorithm for the solution of an ordinary least squares regression, where the left-hand argument to the \ operator is square. It is equivalent to writing

inv(x' * x) * (x' * y)                      -- (3)

but it uses an algorithm which is more numerically stable. It turns out that (3) is moderately faster than (2) even though (2) doesn't have to produce the inverse matrix as a byproduct, but I can accept that given the additional numerical stability.

However, some simple timings (with N=100,000 and P=30) show that expression (2) is more than 5 times faster than expression (1), even though (1) has greater flexibility to choose the algorithm used! For example, any call to (1) could just dispatch on the size of X, and in the case N>P it could reduce to (2), which would add a tiny amount of overhead, but certainly wouldn't take 5 times longer.

What is happening in expression (1) that is causing it to take so much longer?


Edit: Here are my timings

x = randn(1e5, 30);
y = randn(1e5,1);
tic, for i = 1:100; x\y; end; t1=toc;
tic, for i = 1:100; (x'*x)\(x'*y); end; t2=toc;
assert( abs(norm(x\y) - norm((x'*x)\(x'*y))) < 1e-10 );
fprintf('Speedup: %.2f\n', t1/t2)

Speedup: 5.23

like image 336
Chris Taylor Avatar asked Apr 03 '14 08:04

Chris Taylor


1 Answers

You are aware of the fact that in your test

size(x) == [1e5  30]    but   size(x'*x) == [30  30]
size(y) == [1e5   1]    but   size(x'*y) == [30   1]

That means that the matrices entering the mldivide function differ in size by 4 orders of magnitude! This would render any overhead of determining which algorithm to use rather large and significant (and perhaps also running the same algorithm on the two different problems).

In other words, you have a biased test. To make a fair test, use something like

x = randn(1e3);
y = randn(1e3,1);

I find (worst of 5 runs):

Speedup: 1.06  %// R2010a
Speedup: 1.16  %// R2010b
Speedup: 0.97  %// R2013a

...the difference has all but evaporated.

But, this does show very well that if you indeed have a regression problem with low dimensionality compared to the number of observations, it really pays off to do the multiplication first :)

mldivide is a catch-all, and really great at that. But often, having knowledge about the problem may make more specific solutions, like pre-multiplication, pre-conditioning, lu, qr, linsolve, etc. orders of magnitude faster.

like image 96
Rody Oldenhuis Avatar answered Sep 20 '22 18:09

Rody Oldenhuis