Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling multithreaded MKL in from openmp parallel region

I have a code with following structure

#pragma omp parallel
{
    #omp for nowait
    {
        // first for loop
    }

    #omp for nowait 
    {
        // first for loop
    }

    #pragma barrier 

    <-- #pragma omp single/critical/atomic --> not sure 
    dgemm_(....)

    #pragma omp for
    {
        // yet another for loop  
    }

}

For dgemm_, I link with multithreaded mkl. I want mkl to use all available 8 threads. What is the best way to do so?

like image 262
arbitUser1401 Avatar asked Dec 21 '13 04:12

arbitUser1401


1 Answers

This is a case of nested parallelism. It is supported by MKL, but it only works if your executable is built using the Intel C/C++ compiler. The reason for that restriction is that MKL uses Intel's OpenMP runtime and that different OMP runtimes do not play well with each other.

Once that is sorted out, you should enable nested parallelism by setting OMP_NESTED to TRUE and disable MKL's detection of nested parallelism by setting MKL_DYNAMIC to FALSE. If the data to be processes with dgemm_ is shared, then you have to invoke the latter from within a single construct. If each thread processes its own private data, then you don't need any synchronisation constructs, but using multithreaded MKL won't give you any benefit too. Therefore I would assume that your case is the former.

To summarise:

#pragma omp single
dgemm_(...);

and run with:

$ MKL_DYNAMIC=FALSE MKL_NUM_THREADS=8 OMP_NUM_THREADS=8 OMP_NESTED=TRUE ./exe

You could also set the parameters with the appropriate calls:

mkl_set_dynamic(0);
mkl_set_num_threads(8);
omp_set_nested(1);

#pragma omp parallel num_threads(8) ...
{
   ...
}

though I would prefer to use environment variables instead.

like image 84
Hristo Iliev Avatar answered Sep 17 '22 00:09

Hristo Iliev