Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vectorize/optimize this code in MATLAB?

I am building my first large-scale MATLAB program, and I've managed to write original vectorized code for everything so for until I came to trying to create an image representing vector density in stereographic projection. After a couple failed attempts I went to the Mathworks file exchange site and found an open source program which fits my needs courtesy of Malcolm Mclean. With a test matrix his function produces something like this:

Vector density in Stereographic Projection

And while this is almost exactly what I wanted, his code relies on a triply nested for-loop. On my workstation a test data matrix of size 25000x2 took 65 seconds in this section of code. This is unacceptable since I will be scaling up to a data matrices of size 500000x2 in my project.

So far I've been able to vectorize the innermost loop (which was the longest/worst loop), but I would like to continue and be rid of the loops entirely if possible. Here is Malcolm's original code that I need to vectorize:

dmap = zeros(height, width); % height, width: scalar with default value = 32
for ii = 0: height - 1          % 32 iterations of this loop
    yi = limits(3) + ii * deltay + deltay/2; % limits(3) & deltay: scalars
    for jj = 0 : width - 1      % 32 iterations of this loop
        xi = limits(1) + jj * deltax + deltax/2; % limits(1) & deltax: scalars
        dd = 0;
        for kk = 1: length(x)   % up to 500,000 iterations in this loop
            dist2 = (x(kk) - xi)^2 + (y(kk) - yi)^2;
            dd = dd + 1 / ( dist2 + fudge); % fudge is a scalar
        end
        dmap(ii+1,jj+1) = dd;
    end
end

And here it is with the changes I've already made to the innermost loop (which was the biggest drain on efficiency). This cuts the time from 65 seconds down to 12 seconds on my machine for the same test matrix, which is better but still far slower than I would like.

     dmap = zeros(height, width);
    for ii = 0: height - 1
        yi = limits(3) + ii * deltay + deltay/2;
        for jj = 0 : width - 1
            xi = limits(1) + jj * deltax + deltax/2;
            dist2 = (x - xi) .^ 2 + (y - yi) .^ 2;
            dmap(ii + 1, jj + 1) = sum(1 ./ (dist2 + fudge));
        end
    end
So my main question, are there any further changes I can make to optimize this code? Or even an alternative method to approach the problem? I've considered using C++ or F# instead of MATLAB for this section of the program, and I may do so if I cannot get to a reasonable efficiency level with the MATLAB code.

Please also note that at this point I don't have ANY additional toolboxes, if I did then I know this would be trivial (using hist3 from the statistics toolbox for example).

like image 284
user2307487 Avatar asked May 07 '13 08:05

user2307487


People also ask

What does vectorize do in MATLAB?

Vectorization is one of the core concepts of MATLAB. With one command it lets you process all elements of an array, avoiding loops and making your code more readable and efficient. For data stored in numerical arrays, most MATLAB functions are inherently vectorized.

What is vectorization optimization?

Vectorization, in simple words, means optimizing the algorithm so that it can utilize SIMD instructions in the processors. AVX, AVX2 and AVX512 are the instruction sets (intel) that perform same operation on multiple data in one instruction. for eg. AVX512 means you can operate on 16 integer values(4 bytes) at a time.

Why is vectorization faster in MATLAB?

MATLAB is designed to perform vector operations really quickly. MATLAB is an interpreted language, which is why loops are so slow in it. MATLAB sidesteps this issue by providing extremely fast (usually written in C, and optimized for the specific architecture) and well tested functions to operate on vectors.


2 Answers

Mem consuming solution

yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width  ) - .5 * deltax;
dx = bsxfun( @minus, x(:), xi ) .^ 2;
dy = bsxfun( @minus, y(:), yi ) .^ 2;
dist2 = bsxfun( @plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
dmap = sum( 1./(dist2 + fudge ) , 3 );

EDIT

handling extremely large x and y by breaking the operation into blocks:

blockSize = 50000; % process up to XX elements at once
dmap = 0;
yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width  ) - .5 * deltax;
bi = 1;
while bi <= numel(x)
    % take a block of x and y
    bx = x( bi:min(end, bi + blockSize - 1) );
    by = y( bi:min(end, bi + blockSize - 1) );
    dx = bsxfun( @minus, bx(:), xi ) .^ 2;
    dy = bsxfun( @minus, by(:), yi ) .^ 2;
    dist2 = bsxfun( @plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
    dmap = dmap + sum( 1./(dist2 + fudge ) , 3 );
    bi = bi + blockSize;
end     
like image 123
Shai Avatar answered Oct 04 '22 12:10

Shai


This is a good example of why starting a loop from 1 matters. The only reason that ii and jj are initiated at 0 is to kill the ii * deltay and jj * deltax terms which however introduces sequentiality in the dmap indexing, preventing parallelization.

Now, by rewriting the loops you could use parfor() after opening a matlabpool:

dmap = zeros(height, width);
yi   = limits(3) + deltay*(1:height) - .5*deltay;
matlabpool 8
parfor ii = 1: height
    for jj = 1: width
        xi    = limits(1) + (jj-1) * deltax + deltax/2;
        dist2 = (x - xi) .^ 2 + (y - yi(ii)) .^ 2;
        dmap(ii, jj) = sum(1 ./ (dist2 + fudge));
    end
end
matlabpool close

Keep in mind that opening and closing the pool has significant overhead (10 seconds on my Intel Core Duo T9300, vista 32 Matlab 2013a).

PS. I am not sure whether the inner loop instead of the outer one can be meaningfully parallelized. You can try to switch the parfor to the inner one and compare speeds (I would recommend going for the big matrix immediately since you are already running in 12 seconds and the overhead is almost as big).

like image 29
Oleg Avatar answered Oct 04 '22 13:10

Oleg