Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB Speed Optimisation

Can anyone help? I am a fairly experienced Matlab user but am having trouble speeding up the code below.

The fastest time I have been able to achieve for one run through all three loops, using 12 cores, is ~200s. The actual function will be called ~720 times and at this rate will take over 40hrs to execute. According to the Matlab profiler, the majority of cpu time is spent in the exponential function call. I've managed to speed this up quite substantially using a gpuArray and then running the exp call on a Quadro 4000 graphics card however this then prevents the parfor loop from being used, since the workstation has only one graphics card, which obliterates any gains. Can anyone help, or is this code close to the optimum that can be achieved using Matlab? I have written a very crude c++ implementation with openMP but achieved little gain.

Many thanks in advance

function SPEEDtest_CPU

% Variable setup:
% - For testing I'll use random variables. These will actually be fed into 
%   the function for the real version of this code.
sy    = 320;
sx    = 100;
sz    = 32;
A     = complex(rand(sy,sx,sz),rand(sy,sx,sz));
B     = complex(rand(sy,sx,sz),rand(sy,sx,sz));
C     = rand(sy,sx);
D     = rand(sy*sx,1);
F     = zeros(sy,sx,sz);
x     = rand(sy*sx,1);  
y     = rand(sy*sx,1);
x_ind = (1:sx) - (sx / 2) - 1;
y_ind = (1:sy) - (sy / 2) - 1;


% MAIN LOOPS 
%  - In the real code this set of three loops will be called ~720 times!
%  - Using 12 cores, the fastest I have managed is ~200 seconds for one
%    call of this function.
tic
for z = 1 : sz
    A_slice = A(:,:,z);
    A_slice = A_slice(:);
    parfor cx = 1 : sx       
        for cy = 1 : sy       
            E = ( x .* x_ind(cx) ) + ( y .* y_ind(cy) ) + ( C(cy,cx) .* D );                                                          

            F(cy,cx,z) = (B(cy,cx,z) .* exp(-1i .* E))' * A_slice; 
        end       
    end   
end
toc

end
like image 300
jack Avatar asked Oct 04 '13 10:10

jack


3 Answers

Some things to think about:

Have you considered using singles?

Can you vectorize the cx, cy portion so that they represent array operations?

Consider changing the floating point rounding or signalling modes.

like image 148
Mikhail Avatar answered Oct 06 '22 14:10

Mikhail


If your data are real (not complex), as in your example, you can save time replacing

(B(cy,cx,z) .* exp(-1i .* E))'

by

(B(cy,cx,z) .* (cos(E)+1i*sin(E))).'

Specifically, on my machine (cos(x)+1i*sin(x)).' takes 19% less time than exp(-1i .* x)'.


If A and B are complex: E is still real, so you can precompute Bconj = conj(B) outside the loops (this takes about 10 ms with your data size, and it's done only once) and then replace

(B(cy,cx,z) .* exp(-1i .* E))'

by

(Bconj(cy,cx,z) .* (cos(E)+1i*sin(E))).'

to obtain a similar gain.

like image 30
Luis Mendo Avatar answered Oct 06 '22 15:10

Luis Mendo


There are two main ways of speeding up MATLAB code; preallocation and vectorisation.

You have preallocated well but there is no vectorisation. In order to best learn how to do this you need to have a good grasp of linear algebra and the use of repmat to expand vectors into multiple dimensions.

Vectorisation can result in multiple orders of magnitude speedup and will use the cores optimally (provided the flag is up).

What is the mathematical expression you are calculating and I may be able to lend a hand?

like image 27
Luke Avatar answered Oct 06 '22 14:10

Luke