Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write vectorized functions in MATLAB

I am just learning MATLAB and I find it hard to understand the performance factors of loops vs vectorized functions.

In my previous question: Nested for loops extremely slow in MATLAB (preallocated) I realized that using a vectorized function vs. 4 nested loops made a 7x times difference in running time.

In that example instead of looping through all dimensions of a 4 dimensional array and calculating median for each vector, it was much cleaner and faster to just call median(stack, n) where n meant the working dimension of the median function.

But median is just a very easy example and I was just lucky that it had this dimension parameter implemented.

My question is that how do you write a function yourself which works as efficiently as one which has this dimension range implemented?

For example you have a function my_median_1D which only works on a 1-D vector and returns a number.

How do you write a function my_median_nD which acts like MATLAB's median, by taking an n-dimensional array and a "working dimension" parameter?

Update

I found the code for calculating median in higher dimensions

% In all other cases, use linear indexing to determine exact location
% of medians.  Use linear indices to extract medians, then reshape at
% end to appropriate size.
cumSize = cumprod(s);
total = cumSize(end);            % Equivalent to NUMEL(x)
numMedians = total / nCompare;

numConseq = cumSize(dim - 1);    % Number of consecutive indices
increment = cumSize(dim);        % Gap between runs of indices
ixMedians = 1;

y = repmat(x(1),numMedians,1);   % Preallocate appropriate type

% Nested FOR loop tracks down medians by their indices.
for seqIndex = 1:increment:total
  for consIndex = half*numConseq:(half+1)*numConseq-1
    absIndex = seqIndex + consIndex;
    y(ixMedians) = x(absIndex);
    ixMedians = ixMedians + 1;
  end
end

% Average in second value if n is even
if 2*half == nCompare
  ixMedians = 1;
  for seqIndex = 1:increment:total
    for consIndex = (half-1)*numConseq:half*numConseq-1
      absIndex = seqIndex + consIndex;
      y(ixMedians) = meanof(x(absIndex),y(ixMedians));
      ixMedians = ixMedians + 1;
    end
  end
end

% Check last indices for NaN
ixMedians = 1;
for seqIndex = 1:increment:total
  for consIndex = (nCompare-1)*numConseq:nCompare*numConseq-1
    absIndex = seqIndex + consIndex;
    if isnan(x(absIndex))
      y(ixMedians) = NaN;
    end
    ixMedians = ixMedians + 1;
  end
end

Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.

I don't understand how could it be 7x times faster and also, that why is it so complicated.

Update 2

I realized that using median was not a good example as it is a complicated function itself requiring sorting of the array or other neat tricks. I re-did the tests with mean instead and the results are even more crazy: 19 seconds vs 0.12 seconds. It means that the built in way for sum is 160 times faster than the nested loops.

It is really hard for me to understand how can an industry leading language have such an extreme performance difference based on the programming style, but I see the points mentioned in the answers below.

like image 515
hyperknot Avatar asked Oct 18 '11 21:10

hyperknot


People also ask

How do you write a vector function in MATLAB?

J i j ( x ) = ∂ F i ( x ) ∂ x j . If F has m components and x has k components, J is an m-by-k matrix. J ( x ) = [ 2 x 1 x 3 x 2 cos ( x 1 + 2 x 2 − 3 x 3 ) 2 cos ( x 1 + 2 x 2 − 3 x 3 ) − 3 cos ( x 1 + 2 x 2 − 3 x 3 ) ] .

What is a vectorized function in MATLAB?

Vectorization is one of the core concepts of MATLAB. With one command it lets you process all elements of an array, avoiding loops and making your code more readable and efficient. For data stored in numerical arrays, most MATLAB functions are inherently vectorized.

Does MATLAB have vector?

The colon is one of the most useful operators in MATLAB®. It can create vectors, subscript arrays, and specify for iterations. x = j : k creates a unit-spaced vector x with elements [j,j+1,j+2,...,j+m] where m = fix(k-j) . If j and k are both integers, then this is simply [j,j+1,...,k] .

What is vectorized code?

Vectorized code refers to operations that are performed on multiple components of a vector at the. same time (in one statement). Note that the addition (arithmetic operation) in the left code fragment. is performed on all (multiple) components of the vectors a and b in one statement—the operands of.


2 Answers

Update 2 (to address your updated question)

MATLAB is optimized to work well with arrays. Once you get used to it, it is actually really nice to just have to type one line and have MATLAB do the full 4D looping stuff itself without having to worry about it. MATLAB is often used for prototyping / one-off calculations, so it makes sense to save time for the person coding, and giving up some of C[++|#]'s flexibility.

This is why MATLAB internally does some loops really well - often by coding them as a compiled function.

The code snippet you give doesn't really contain the relevant line of code which does the main work, namely

% Sort along given dimension
x = sort(x,dim);

In other words, the code you show only needs to access the median values by their correct index in the now-sorted multi-dimensional array x (which doesn't take much time). The actual work accessing all array elements was done by sort, which is a built-in (i.e. compiled and highly optimized) function.

Original answer (about how to built your own fast functions working on arrays)

There are actually quite a few built-ins that take a dimension parameter: min(stack, [], n), max(stack, [], n), mean(stack, n), std(stack, [], n), median(stack,n), sum(stack, n)... together with the fact that other built-in functions like exp(), sin() automatically work on each element of your whole array (i.e. sin(stack) automatically does four nested loops for you if stack is 4D), you can built up a lot of functions that you might need just be relying on the existing built-ins.

If this is not enough for a particular case you should have a look at repmat, bsxfun, arrayfun and accumarray which are very powerful functions for doing things "the MATLAB way". Just search on SO for questions (or rather answers) using one of these, I learned a lot about MATLABs strong points that way.

As an example, say you wanted to implement the p-norm of stack along dimension n, you could write

function result=pnorm(stack, p, n)
result=sum(stack.^p,n)^(1/p);

... where you effectively reuse the "which-dimension-capability" of sum.

Update

As Max points out in the comments, also have a look at the colon operator (:) which is a very powerful tool for selecting elements from an array (or even changing it shape, which is more generally done with reshape).

In general, have a look at the section Array Operations in the help - it contains repmat et al. mentioned above, but also cumsum and some more obscure helper functions which you should use as building blocks.

like image 112
Jonas Heidelberg Avatar answered Nov 07 '22 08:11

Jonas Heidelberg


Vectorization

In addition to whats already been said, you should also understand that vectorization involves parallelization, i.e. performing concurrent operations on data as opposed to sequential execution (think SIMD instructions), and even taking advantage of threads and multiprocessors in some cases...

MEX-files

Now although the "interpreted vs. compiled" point has already been argued, no one mentioned that you can extend MATLAB by writing MEX-files, which are compiled executables written in C, that can be called directly as normal function from inside MATLAB. This allows you to implement performance-critical parts using a lower-level language like C.

Column-major order

Finally, when trying to optimize some code, always remember that MATLAB stores matrices in column-major order. Accessing elements in that order can yield significant improvements compared to other arbitrary orders.

For example, in your previous linked question, you were computing the median of set of stacked images along some dimension. Now the order in which those dimensions are ordered greatly affect the performance. Illustration:

%# sequence of 10 images
fPath = fullfile(matlabroot,'toolbox','images','imdemos');
files = dir( fullfile(fPath,'AT3_1m4_*.tif') );
files = strcat(fPath,{filesep},{files.name}');      %'

I = imread( files{1} );

%# stacked images along the 1st dimension: [numImages H W RGB]
stack1 = zeros([numel(files) size(I) 3], class(I));
for i=1:numel(files)
    I = imread( files{i} );
    stack1(i,:,:,:) = repmat(I, [1 1 3]);   %# grayscale to RGB
end

%# stacked images along the 4th dimension: [H W RGB numImages]
stack4 = permute(stack1, [2 3 4 1]);

%# compute median image from each of these two stacks
tic, m1 = squeeze( median(stack1,1) ); toc
tic, m4 = median(stack4,4); toc
isequal(m1,m4)

The timing difference was huge:

Elapsed time is 0.257551 seconds.     %# stack1
Elapsed time is 17.405075 seconds.    %# stack4
like image 34
Amro Avatar answered Nov 07 '22 07:11

Amro