I have some Matlab code which needs to be speeded up. Through profiling, I've identified a particular function as the culprit in slowing down the execution. This function is called hundreds of thousands of times within a loop.
My first thought was to convert the function to mex (using Matlab Coder) to speed it up. However, common programming sense tells me the interface between Matlab and the mex code would lead to some overhead, which means calling this mex function thousands of times might not be a good idea. Is this correct? Or does Matlab do some magic when it's the same mex being called repeatedly to remove the overhead?
If there is significant overhead, I'm thinking of restructuring the code so as to add the loop to the function itself and then creating a mex of that. Before doing that, I would like to validate my assumption to justify the time spent on this.
Update:
I tried @angainor's suggestion, and created donothing.m with the following code:
function nothing = donothing(dummy) %#codegen
nothing = dummy;
end
Then, I created a mex function from this as donothing_mex, and tried the following code:
tic;
for i=1:1000000
donothing_mex(5);
end
toc;
The result was that a million calls to the function took about 9 seconds. This is not a significant overhead for our purposes, so for now I think I will convert the called function alone to mex. However, calling a function from a loop that executes about a million times does seem a pretty stupid idea in retrospect, considering this is performance critical code, so moving the loop to the mex function is still in the books, but with much lesser priority.
As usual, it all depends on the amount of work you do in the MEX file.. The overhead of calling MEX function is constant and does not depend on e.g., the problem size. It means that arguments are not copied to new, temporary arrays. Hence, if it is enough work, the MATLAB overhead of calling the MEX file will not show. Anyway, in my experience the MEX call overhead is significant only for the first time the mex function is called - the dynamic library has to be loaded, symbols resolved etc. Subsequent MEX calls have very little overhead and are very efficient.
Almost everything in MATLAB is connected with some overhead due to the nature of this high-level language. Unless you have a code, which you are sure is fully compiled with JIT (but then you do not need a mex file :)) So you have a choice of one overhead over the other..
So sum up - I would not be too scared of MEX calling overhead.
Edit As often heard here and elsewhere, the only reasonable thing to do in any particular case is of course BENCHMARK and check it for your self. You can easily estimate the MEX call overhead by writing a trivial MEX function:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
}
On my computer you get
tic; for i=1:1000000; mexFun; end; toc
Elapsed time is 2.104849 seconds.
That is 2e-6s overhead per MEX call. Add your code, time it and see, if the overhead is at acceptable level, or not.
As Andrew Janke noted below (thanks!), the MEX function overhead apparently depends on the number of arguments you pass to the MEX function. It is a small dependence, but it is there:
a = ones(1000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41 seconds.
It is not related to size of a
:
a = ones(1000000,1);
tic; for i=1:1000000; mexFun(a); end; toc
Elapsed time is 2.41805 seconds.
But it is related to the number of arguments
a = ones(1000000,1);
b = ones(1000000,1);
tic; for i=1:1000000; mexFun(a, b); end; toc
Elapsed time is 2.690237 seconds.
So you might want to take that into account in your tests.
You should absolutely without any hesitation move the loop inside the mex file. The example below demonstrates a 1000 times speedup for a virtually empty work unit in a for loop. Obviously as the amount of work in the for loop changes this speedup will decrease.
Here is an example of the difference:
Mex function without internal loop:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
int i=1;
plhs[0] = mxCreateDoubleScalar(i);
}
Called in Matlab:
tic;for i=1:1000000;donothing();end;toc
Elapsed time is 3.683634 seconds.
Mex function with internal loop:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
int M = mxGetScalar(prhs[0]);
plhs[0] = mxCreateNumericMatrix(M, 1, mxDOUBLE_CLASS, mxREAL);
double* mymat = mxGetPr(plhs[0]);
for (int i=0; i< M; i++)
mymat[i] = M-i;
}
Called in Matlab:
tic; a = donothing(1000000); toc
Elapsed time is 0.003350 seconds.
Well, this is the fastest I can make it in Matlab:
%#eml
function L = test(s,t)
m = numel(s);
n = numel(t);
% trivial cases
if m==0 && n==0
L = 0; return; end
if n==0
L = m; return; end
if m==0
L = n; return; end
% non-trivial cases
M = zeros(m+1,n+1);
M(:,1) = 0:m;
for j = 2:n+1
for i = 2:m+1
M(i,j) = min([
M(i-1,j) + 1
M(i,j-1) + 1
M(i-1,j-1) + (s(i-1)~=t(j-1));
]);
end
end
L = min(M(end,:));
end
Can you compile this and run some tests? (For some weird reason, compilation fails to work on my installation...) Perhaps change %#eml
to %#codegen
first, if you think that's easier.
NOTE: for the C version, you should also interchange the for-loops, so that the loop over j
is the inner one.
Also, the row1
and row2
approach is a lot more memory efficient. If you're going to compile anyway, I'd use that approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With