Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab repeated matrix multiplication - loop vs built-in performances

Given a matrix A, I need to multiply with other n vectors Bi (i.e. i=1...n). The size of A can be like 5000x5000 and thus Bi like 5000x1.

If I evaluate the product in the following way:

for i=1:n
    product=A*Bi;
    % do something with product
end

The result is way (orders of magnitude) slower than computing the products like:

%assume that S is a matrix that contains the vectors Bi as columns, i.e. S(:,i)=Bi, then:
results=A*S;   %stores all the products in matrix form
% do something with results

The problem is that the number n of vectors Bi may be too big to be stored in memory, for example n=300000, so I need to use a loop approach where each time I evaluate the product, use it and then discard the vector Bi.

Why is such an approach so slow compared to direct multiplication, and are there ways to overcome this?

like image 397
xanz Avatar asked Jan 08 '16 11:01

xanz


2 Answers

You could try loop over batches so for example

for i = 0:(n/k)-1
    product = A*S(:,(i*k+1):(i+1)*k)
end

And adjust k to find the best trade off of speed and memory for you.

MATLAB's loops are slow because it is an interpreted language. So it has to work out a lot of stuff on the fly. The loops are greatly improved these days because of the JIT compiler, but they are still slow compared to the built-in functions which are written and compiled in C. Further to that, they use really cutting edge super fast matrix multiplication algorithms, as compared with your fairly naive algorithm achieved by looping, which also aid in the speed-up you are experiencing.

like image 150
Dan Avatar answered Nov 15 '22 07:11

Dan


For simplicity my answer will assume a n-by-n square matrix A, but it is true for non squares as well.

Your loop approach uses matrix vector multiplication. The naive solution is also the best known, resulting in a runtime of O(n^2) which is repeated n times. You end up with a total runtime of O(n^3).

For matrix multiplication, there is a better approach. The best known algorithm only requires little less than O(n^2.4) runtime, making it much faster for large number.

You will achieve a better runtime when multiplying multiple vectors Bi at once using matrix multiplication. This will not achieve the performance of a pure matrix multiplication, but working with larger slices of b is probably the fastest memory efficient solution.

Some Code for the different discussed approaches:

n=5000;
k=100;
A=rand(n,n);
S=rand(n,n);
workers=matlabpool('size');
%for a parfor solution, the batch size must be smaller because multiple batches are stred in memory at once
kparallel=k/workers;
disp('simple loop:');
tic;
for i = 1:n
    product = A*S(:,n);
end
toc
disp('batched loop:');
tic;
for i = 1:(n/k)
    product = A*S(:,(i-1)*k+1:(i)*k);
end
toc
disp('batched parfor loop:');
tic;
parfor i = 1:(n/kparallel)
    product = A*S(:,(i-1)*kparallel+1:(i)*kparallel);
end
toc
disp('matrix multiplication:');
tic;
A*S;
toc
like image 30
Daniel Avatar answered Nov 15 '22 05:11

Daniel