Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab: Does calling the same mex function repeatedly from a loop incur too much overhead?

I have some Matlab code which needs to be speeded up. Through profiling, I've identified a particular function as the culprit in slowing down the execution. This function is called hundreds of thousands of times within a loop.

My first thought was to convert the function to mex (using Matlab Coder) to speed it up. However, common programming sense tells me the interface between Matlab and the mex code would lead to some overhead, which means calling this mex function thousands of times might not be a good idea. Is this correct? Or does Matlab do some magic when it's the same mex being called repeatedly to remove the overhead?

If there is significant overhead, I'm thinking of restructuring the code so as to add the loop to the function itself and then creating a mex of that. Before doing that, I would like to validate my assumption to justify the time spent on this.

Update:

I tried @angainor's suggestion, and created donothing.m with the following code:

function nothing = donothing(dummy) %#codegen
nothing = dummy;
end

Then, I created a mex function from this as donothing_mex, and tried the following code:

tic;
for i=1:1000000
    donothing_mex(5);
end
toc;

The result was that a million calls to the function took about 9 seconds. This is not a significant overhead for our purposes, so for now I think I will convert the called function alone to mex. However, calling a function from a loop that executes about a million times does seem a pretty stupid idea in retrospect, considering this is performance critical code, so moving the loop to the mex function is still in the books, but with much lesser priority.

like image 847
Sundar R Avatar asked Oct 16 '12 19:10

Sundar R


3 Answers

As usual, it all depends on the amount of work you do in the MEX file.. The overhead of calling MEX function is constant and does not depend on e.g., the problem size. It means that arguments are not copied to new, temporary arrays. Hence, if it is enough work, the MATLAB overhead of calling the MEX file will not show. Anyway, in my experience the MEX call overhead is significant only for the first time the mex function is called - the dynamic library has to be loaded, symbols resolved etc. Subsequent MEX calls have very little overhead and are very efficient.

Almost everything in MATLAB is connected with some overhead due to the nature of this high-level language. Unless you have a code, which you are sure is fully compiled with JIT (but then you do not need a mex file :)) So you have a choice of one overhead over the other..

So sum up - I would not be too scared of MEX calling overhead.

Edit As often heard here and elsewhere, the only reasonable thing to do in any particular case is of course BENCHMARK and check it for your self. You can easily estimate the MEX call overhead by writing a trivial MEX function:

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ]) 
{      
}

On my computer you get

tic; for i=1:1000000; mexFun; end; toc
Elapsed time is 2.104849 seconds.

That is 2e-6s overhead per MEX call. Add your code, time it and see, if the overhead is at acceptable level, or not.

As Andrew Janke noted below (thanks!), the MEX function overhead apparently depends on the number of arguments you pass to the MEX function. It is a small dependence, but it is there:

a = ones(1000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41 seconds.

It is not related to size of a:

a = ones(1000000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41805 seconds.

But it is related to the number of arguments

a = ones(1000000,1);
b = ones(1000000,1);
tic; for i=1:1000000; mexFun(a, b); end; toc
Elapsed time is 2.690237 seconds.

So you might want to take that into account in your tests.

like image 198
angainor Avatar answered Oct 21 '22 01:10

angainor


You should absolutely without any hesitation move the loop inside the mex file. The example below demonstrates a 1000 times speedup for a virtually empty work unit in a for loop. Obviously as the amount of work in the for loop changes this speedup will decrease.

Here is an example of the difference:

Mex function without internal loop:

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ]) 
{      
    int i=1;    
    plhs[0] = mxCreateDoubleScalar(i);
}

Called in Matlab:

tic;for i=1:1000000;donothing();end;toc
Elapsed time is 3.683634 seconds.

Mex function with internal loop:

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ]) 
{      
    int M = mxGetScalar(prhs[0]);
    plhs[0] = mxCreateNumericMatrix(M, 1, mxDOUBLE_CLASS, mxREAL);
    double* mymat = mxGetPr(plhs[0]);
    for (int i=0; i< M; i++)
        mymat[i] = M-i;
}

Called in Matlab:

tic; a = donothing(1000000); toc
Elapsed time is 0.003350 seconds.
like image 42
twerdster Avatar answered Oct 21 '22 01:10

twerdster


Well, this is the fastest I can make it in Matlab:

%#eml
function L = test(s,t)

    m = numel(s);
    n = numel(t);

    % trivial cases
    if m==0 && n==0
        L = 0; return; end
    if n==0
        L = m; return; end
    if m==0
        L = n; return; end

    % non-trivial cases
    M = zeros(m+1,n+1);    
    M(:,1) = 0:m;

    for j = 2:n+1
        for i = 2:m+1
            M(i,j) = min([
                M(i-1,j) + 1
                M(i,j-1) + 1
                M(i-1,j-1) + (s(i-1)~=t(j-1));
                ]);
        end
    end

    L = min(M(end,:));

end

Can you compile this and run some tests? (For some weird reason, compilation fails to work on my installation...) Perhaps change %#eml to %#codegen first, if you think that's easier.

NOTE: for the C version, you should also interchange the for-loops, so that the loop over j is the inner one.

Also, the row1 and row2 approach is a lot more memory efficient. If you're going to compile anyway, I'd use that approach.

like image 22
Rody Oldenhuis Avatar answered Oct 21 '22 01:10

Rody Oldenhuis