Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I deal with a data race in OpenMP?

Tags:

c

for-loop

openmp

I am trying to use OpenMP to add the numbers in an array. The following is my code:

  int* input = (int*) malloc (sizeof(int)*snum);
  int sum = 0;
  int i;
  for(i=0;i<snum;i++){
      input[i] = i+1;
  }
  #pragma omp parallel for schedule(static)
  for(i=0;i<snum;i++)
  {
      int* tmpsum = input+i;
 sum += *tmpsum;
  }

This does not produce the right result for sum. What's wrong?

like image 897
YOung Avatar asked Nov 18 '14 15:11

YOung


2 Answers

Your code currently has a race condition, which is why the result is incorrect. To illustrate why this is, let's use a simple example:

You are running on 2 threads and the array is int input[4] = {1, 2, 3, 4};. You initialize sum to 0 correctly and are ready to start the loop. In the first iteration of your loop, thread 0 and thread 1 read sum from memory as 0, and then add their respective element to sum, and write it back to memory. However, this means that thread 0 is trying to write sum = 1 to memory (the first element is 1, and sum = 0 + 1 = 1), while thread 1 is trying to write sum = 2 to memory (the second element is 2, and sum = 0 + 2 = 2). The end result of this code depends on which one of the threads finishes last, and therefore writes to memory last, which is a race condition. Not only that, but in this particular case, neither of the answers that the code could produce are correct! There are several ways to get around this; I'll detail three basic ones below:

#pragma omp critical:

In OpenMP, there is what is called a critical directive. This restricts the code so that only one thread can do something at a time. For example, your for-loop can be written:

#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
#pragma omp critical
    sum += *tmpsum;
}

This eliminates the race condition as only one thread accesses and writes to sum at a time. However, the critical directive is very very bad for performance, and will likely kill a large portion (if not all) of the gains you get from using OpenMP in the first place.

#pragma omp atomic:

The atomic directive is very similar to the critical directive. The major difference is that, while the critical directive applies to anything that you would like to do one thread at a time, the atomic directive only applies to memory read/write operations. As all we are doing in this code example is reading and writing to sum, this directive will work perfectly:

#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
#pragma omp atomic
    sum += *tmpsum;
}

The performance of atomic is generally significantly better than that of critical. However, it is still not the best option in your particular case.

reduction:

The method you should use, and the method that has already been suggested by others, is reduction. You can do this by changing the for-loop to:

#pragma omp parallel for schedule(static) reduction(+:sum)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
    sum += *tmpsum;
}

The reduction command tells OpenMP that, while the loop is running, you want each thread to keep track of its own sum variable, and add them all up at the end of the loop. This is the most efficient method as your entire loop now runs in parallel, with the only overhead being right at the end of the loop, when the sum values of each of the threads need to be added up.

like image 62
wolfPack88 Avatar answered Nov 01 '22 10:11

wolfPack88


Use reduction clause (description at MSDN).

int* input = (int*) malloc (sizeof(int)*snum);
int sum = 0;
int i;
for(i=0;i<snum;i++){
    input[i] = i+1;
}
#pragma omp parallel for schedule(static) reduction(+:sum)
for(i=0;i<snum;i++)
{
    sum += input[i];
}
like image 34
Nikolay K Avatar answered Nov 01 '22 10:11

Nikolay K