Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does it take such a long time to return to Matlab after reaching the last line of a MEX file?

Tags:

c

matlab

mex

It takes roughly ~14 seconds to return to the matlab command line after the last line of my MEX file has finished executing.

When timing of the MEX file from matlab:

D=rand(14000)+rand(14000)*1i;
tic;
[A B C]=myMexFile(D);
toc
disp(datetime('now'));

The output is:

Elapsed time is 35.192704 seconds.
   15-Sep-2018 16:51:35

While timing the MEX file from within C using the following minimum working example:

#include <mex.h>
#include <sys/time.h>
#include <time.h>
#include <cuComplex.h>

double getHighResolutionTime() {
    struct timeval tod;
    gettimeofday(&tod, NULL);
    double time_seconds = (double) tod.tv_sec + ((double) tod.tv_usec / 1000000.0);
    return time_seconds;
}

void double2cuDoubleComplex(cuDoubleComplex* p, double* pr, double* pi,int numElements){
    for(int j=0;j<numElements;j++){
        p[j].x=pr[j];
        p[j].y=pi[j];
    }
}

void cuDoubleComplex2double(cuDoubleComplex* p, double* pr, double* pi,int numElements){
    for(int j=0;j<numElements;j++){
        pr[j]= p[j].x;
        pi[j]= p[j].y;
    }
}

void mexFunction( int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[]) {

    double tic=getHighResolutionTime();

    int m=(int)mxGetM(prhs[0]);
    int n=(int)mxGetN(prhs[0]);
    int SIZE=m*n;

    //get pointers to input data from matlab and convert to 
    //interleaved (Fortran) ordering
    cuDoubleComplex *Gr= (cuDoubleComplex*) mxMalloc(SIZE*sizeof(cuDoubleComplex));
    double2cuDoubleComplex(Gr,mxGetPr(prhs[0]),mxGetPi(prhs[0]),SIZE);


    //modify the input data, allocate output matrices, and convert 
    //back to split (matlab) ordering.
    Gr[0].x=0.0;
    plhs[0] = mxCreateDoubleMatrix(m,m,mxCOMPLEX);
    cuDoubleComplex2double(Gr,mxGetPr(plhs[0]),mxGetPi(plhs[0]),SIZE);

    Gr[0].x=1.0;
    plhs[1] = mxCreateDoubleMatrix(m,m,mxCOMPLEX);
    cuDoubleComplex2double(Gr,mxGetPr(plhs[1]),mxGetPi(plhs[1]),SIZE);

    Gr[0].x=2.0;
    plhs[2] = mxCreateDoubleMatrix(m,m,mxCOMPLEX);
    cuDoubleComplex2double(Gr,mxGetPr(plhs[2]),mxGetPi(plhs[2]),SIZE);

    mxFree(Gr);

    double elapsed=getHighResolutionTime()-tic;mexPrintf("%f\n", elapsed);
    time_t current_time = time(NULL);
    char* c_time_string = ctime(&current_time);
    mexPrintf("time at end of MEX file %s\n", c_time_string);
}

The output is:

21.676793
time at end of MEX file Sat Sep 15 16:51:21 2018

Matlab returns a time of 35.19s while the MEX file actually takes 21.67s to reach the last line. The date and time are ~14 seconds apart i.e. 16:51:21 for the MEX file and 16:51:35 for matlab.

The outputs are very large matrices, but they are successfully allocated and initialized before the last line of the MEX file. I cannot think of anything else. What is causing this behaviour and how do I avoid it?

Update: I've tried this on more machines and the time discrepancy is still there.

Update: I've replaced the above pseudo-code with a minimum working example. Note that the code the above code does not actually use any GPU functionality. I'm including the cuComplex.h header just to use the cuDoubleComplex datatype.

like image 591
avgn Avatar asked Sep 15 '18 03:09

avgn


Video Answer


1 Answers

As of MATLAB R2018a, MATLAB internally stores complex arrays in an interleaved format. In previous versions, MATLAB used two separate memory blocks to store complex data: one for the real values and one for the imaginary values. In a MEX-file, you used mxGetPr() and mxGetPi() to get pointers to these two memory blocks (these functions are referred to as the "Separate Complex API").

Starting with R2018a, with the new internal data representation, MEX-files can be compiled in two different ways:

  1. A compatibility mode (this is the default, you can add -R2017b to the mex command to force this mode), where you can compile old MEX-files without modification. These MEX-files thus use the "Separate Complex API". MATLAB copies complex data from its new interleaved representation into separate real and imaginary memory blocks before executing the MEX-file code, and copies any complex output arrays back into the interleaved format. This obviously costs some time. This is the cause of the delay observed by the OP.

  2. A new mode (add -R2018a to the mex command), where MEX-files use the new "Interleaved Complex API". That is, the MEX-file code is adapted to use the new interleaved complex format. Since most C and C++ libraries you might want to call from your MEX-file use an interleaved format, this is actually a big advantage.

The solution to avoid the large delay at the start and end of MEX-files that process complex arrays is to rewrite them to use the new "Interleaved Complex API". This requires the following changes:

  • Find all uses of the mxGetPr() and mxGetPi() functions. The latter no longer is available. mxGetPr() now throws an error if the input array is complex-valued. Instead, use mxGetData(), which will return a pointer to the complex interleaved data. Note that they recommend you don't use it for numeric data, it seems they prefer you use the new "typed data access functions". mxGetImagData(), like mxGetPi(), no longer exists.

  • The same is true for the functions that set the data pointer (mxSet...()).

  • Don't forget to check if the input array actually is complex and of type double, using mxIsComplex() and mxIsDouble().

  • The function mxGetElementSize now returns 16 for complex double data, not 8 like it used to.

  • Compile your MEX-file using mex -R2018a <filename>.

Here are some more troubleshooting tips.

like image 109
Cris Luengo Avatar answered Oct 20 '22 01:10

Cris Luengo