Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to make thread join to 'parallel for' region after its job?

I have two jobs that need to run simultaneously at first:

1) for loop that can be parallelized

2) function that can be done with one thread

Now, let me describe what I want to do.

If there exist 8 available threads,

job(1) and job(2) have to run simultaneously at first with 7 threads and 1 thread, respectively.

After job(2) finishes, the thread that job(2) was using should be allocated to job(1) which is the parallel for loop.

I'm using omp_get_thread_num to count how many threads are active in each region. I would expect the the number of threads in job(1) increases by 1 when job(2) finishes.

Below describes a solution that might be wrong or ok:

  omp_set_nested(1);
  #pragma omp parallel
  {
    #pragma omp sections
    {
      #pragma omp section // job(2)
      { // 'printf' is not real job. It is just used for simplicity.
        printf("i'm single: %d\n", omp_get_thread_num());
      }
      #pragma omp section // job(1)
      {
        #pragma omp parallel for schedule(dynamic, 32)
        for (int i = 0 ; i < 10000000; ++i) {
          // 'printf' is not real job. It is just used for simplicity.
          printf("%d\n", omp_get_thread_num());
        }
      }
    }
  }

How can make the work that I want to achieve be done?

like image 373
sungjun cho Avatar asked Jun 13 '19 05:06

sungjun cho


People also ask

What happens to thread after join?

Your working threads will be in Terminated state (die) after join returns.

Should you always join threads?

There is no obligation to use join() so there is no 'should' about it. If you want to pause the current thread while another thread completes, do so.


3 Answers

What about something like this?

#pragma omp parallel
{
     // note the nowait here so that other threads jump directly to the for loop
    #pragma omp single nowait
    {
       job2();
    }

    #pragma omp for schedule(dynamic, 32)
    for (int i = 0 ; i < 10000000; ++i) {
        job1();
    }
}

I did not test this but the single will be executed by only one threads while all others will jump directly to the for loop thanks to the nowait. Also I think it is easier to read than with sections.

like image 51
amlucas Avatar answered Oct 15 '22 01:10

amlucas


Another way (and potentially the better way) to express this would be to use OpenMP tasks:

#pragma omp parallel master
{
    #pragma omp task // job(2)
    { // 'printf' is not real job. It is just used for simplicity.
        printf("i'm single: %d\n", omp_get_thread_num());
    }
    #pragma omp taskloop // job(1)
    for (int i = 0 ; i < 10000000; ++i) {
        // 'printf' is not real job. It is just used for simplicity.
        printf("%d\n", omp_get_thread_num());
    }
}

If you have a compiler that does not understand OpenMP version 5.0, then you have to split the parallel and master:

#pragma omp parallel
#pragma omp master
{
    #pragma omp task // job(2)
    { // 'printf' is not real job. It is just used for simplicity.
        printf("i'm single: %d\n", omp_get_thread_num());
    }
    #pragma omp taskloop ]
    for (int i = 0 ; i < 10000000; ++i) {
        // 'printf' is not real job. It is just used for simplicity.
        printf("%d\n", omp_get_thread_num());
    }
}
like image 38
Michael Klemm Avatar answered Oct 15 '22 02:10

Michael Klemm


The problem comes from synchronization. At the end of the section, omp waits for the termination of all threads and cannot release the thread on job 2 until its completion has been checked.

The solution requires to suppress the synchronization with a nowait.
I did not succeed to suppress synchronization with sections and nested parallelism. I rarely use nested parallel regions, but I think that, while sections can be nowaited, there is a problem when spawning the new nested parallel region inside a section. There is a mandatory synchronization at the end of a parallel section that cannot be suppressed and it probably prevents new threads to join the pool.

What I did is to use a single thread, without synchronization. This way, omp start the single thread and does not wait for its completion to start the parallel for. When the thread finishes its single work, it joins the thread pool to finish processing the for.

#include <omp.h>
#include <stdio.h>

int main() {
  int singlethreadid=-1;
  // omp_set_nested(1);
#pragma omp parallel
  {
#pragma omp single nowait  // job(2)
    { // 'printf' is not real job. It is just used for simplicity.
      printf("i'm single: %d\n", omp_get_thread_num());
      singlethreadid=omp_get_thread_num();
    }
#pragma omp for schedule(dynamic, 32) 
    for (int i = 0 ; i < 100000; ++i) {
      // 'printf' is not real job. It is just used for simplicity.
      printf("%d\n", omp_get_thread_num());
      if (omp_get_thread_num() == singlethreadid)
        printf("Hello, I\'m back\n");
    }
  }
}
like image 33
Alain Merigot Avatar answered Oct 15 '22 03:10

Alain Merigot