I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.) <pre class="prettyprint"><code>#include <iostream> #include <stdio.h> #include <time.h> #include <omp.h> using namespace std; int main () { int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10]; time_t start_time = time(NULL); #pragma omp parallel for private(m) reduction(+:m) for (int n=0 ; n<10 ; ++n ){ for (int m=0; m<=n; ++m){ S[n] += A[m]; } } time_t end_time = time(NULL); cout << end_time-start_time; return 0; } </code></pre>

Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it. The first method makes private version of <code>S</code> for each thread, fill them in parallel, and then merges them into <code>S</code> in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into <code>S</code> without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section First method <pre class="prettyprint"><code>int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; #pragma omp parallel { int S_private[10] = {0}; #pragma omp for for (int n=0 ; n<10 ; ++n ) { for (int m=0; m<=n; ++m){ S_private[n] += A[m]; } } #pragma omp critical { for(int n=0; n<10; ++n) { S[n] += S_private[n]; } } } </code></pre> Second method <pre class="prettyprint"><code>int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; int *S_private; #pragma omp parallel { const int nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); #pragma omp single { S_private = new int[10*nthreads]; for(int i=0; i<(10*nthreads); i++) S_private[i] = 0; } #pragma omp for for (int n=0 ; n<10 ; ++n ) { for (int m=0; m<=n; ++m){ S_private[ithread*10+n] += A[m]; } } #pragma omp for for(int i=0; i<10; i++) { for(int t=0; t<nthreads; t++) { S[i] += S_private[10*t + i]; } } } delete[] S_private; </code></pre>

Since none of the other answers mentioned, I am adding this answer. <blockquote> I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an > alternative? </blockquote> With OpenMP 4.5 you can reduce array using pragmas, namely: <pre class="prettyprint"><code>#pragma omp parallel for reduction(+:S) </code></pre> A full running example: <pre class="prettyprint"><code>#define S_SIZE 10 #include <stdio.h> #include <time.h> #include <omp.h> int main () { int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [S_SIZE] = {0}; #pragma omp parallel for reduction(+:S[:S_SIZE]) for (int n=0 ; n<S_SIZE ; ++n ){ for (int m=0; m<=n; ++m){ S[n] += A[m]; } } int expected_output [] = {84, 114, 209, 303, 339, 412, 464, 487, 489, 502}; for(int i = 0; i < S_SIZE; i++){ if(S[i] == expected_output[i]) printf("%d\n", S[i]); else printf("ERROR! it should have been %d instead of %d\n", expected_output[i], S[i]); } return 0; } </code></pre> Output: <pre class="prettyprint"><code>84 114 209 303 339 412 464 487 489 502 </code></pre>

Reducing on array in OpenMP

Tags:

c++

multithreading

parallel-processing

openmp

reduction

I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.)

#include <iostream>
#include <stdio.h>
#include <time.h>
#include <omp.h>
using namespace std;

int main ()
{
  int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
  int S [10];

  time_t start_time = time(NULL);
  #pragma omp parallel for private(m) reduction(+:m)
  for (int n=0 ; n<10 ; ++n ){
    for (int m=0; m<=n; ++m){
      S[n] += A[m];
    }
  }
  time_t end_time = time(NULL);
  cout << end_time-start_time;

  return 0;
}

486

asked Dec 06 '13 00:12

user2891902

3 Answers

Yes it is possible to do an array reduction with OpenMP. In Fortran it even has construct for this. In C/C++ you have to do it yourself. Here are two ways to do it.

The first method makes private version of S for each thread, fill them in parallel, and then merges them into S in a critical section (see the code below). The second method makes an array with dimentions 10*nthreads. Fills this array in parallel and then merges it into S without using a critical section. The second method is much more complicated and can have cache issues especially on multi-socket systems if you are not careful. For more details see this Fill histograms (array reduction) in parallel with OpenMP without using a critical section

First method

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; #pragma omp parallel {     int S_private[10] = {0};     #pragma omp for     for (int n=0 ; n<10 ; ++n ) {         for (int m=0; m<=n; ++m){             S_private[n] += A[m];         }     }     #pragma omp critical     {         for(int n=0; n<10; ++n) {             S[n] += S_private[n];         }     } }

Second method

int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10] = {0}; int *S_private; #pragma omp parallel {     const int nthreads = omp_get_num_threads();     const int ithread = omp_get_thread_num();      #pragma omp single      {         S_private = new int[10*nthreads];         for(int i=0; i<(10*nthreads); i++) S_private[i] = 0;     }     #pragma omp for     for (int n=0 ; n<10 ; ++n )     {         for (int m=0; m<=n; ++m){             S_private[ithread*10+n] += A[m];         }     }     #pragma omp for     for(int i=0; i<10; i++) {         for(int t=0; t<nthreads; t++) {             S[i] += S_private[10*t + i];         }     } } delete[] S_private;

177

answered Sep 20 '22 13:09

Z boson

I have two remarks concerning Zboson's answer:
1. Method 1 is certainly correct but the reduction loop is actually run serially, because of the #pragma omp critical which is of course necessary as the partial matrices are local to each thread and the corresponding reduction has to be done by the thread owing the matrix.
2. Method 2: The initialization loop can be moved outside the single section and therefore become parallelizable.

The following program implements array reduction using openMP v4.0 user defined reduction facility:

/* Compile with:      gcc -Wall -fopenmp -o ar ar.c    Run with:      OMP_DISPLAY_ENV=TRUE OMP_NUM_THREADS=10 OMP_NESTED=TRUE ./ar */ #include <stdio.h> #include <omp.h> struct m10x1 {int v[10];}; int A [] =       {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};   struct m10x1 S = {{ 0,  0,  0,  0,  0,  0,  0,  0, 0,  0}}; int n,m=0;  void print_m10x1(struct m10x1 x){   int i;   for(i=0;i<10;i++) printf("%d ",x.v[i]);   printf("\n"); }  struct m10x1 add_m10x1(struct m10x1 x,struct m10x1 y){   struct m10x1 r ={{ 0,  0,  0,  0,  0,  0,  0,  0, 0,  0}};   int i;   for (i=0;i<10;i++) r.v[i]=x.v[i]+y.v[i];   return r; }  #pragma omp declare reduction(m10x1Add: struct m10x1: \ omp_out=add_m10x1(omp_out, omp_in)) initializer( \ omp_priv={{ 0,  0,  0,  0,  0,  0,  0,  0, 0,  0}} )  int main () {   #pragma omp parallel for reduction(m10x1Add: S)   for ( n=0 ; n<10 ; ++n )     {       for (m=0; m<=n; ++m){         S.v[n] += A[m];       }     }   print_m10x1(S); }

This follows verbatim the complex number reduction example on page 97 of OpenMP 4.0 features.

Although the parallel version works correctly, there probably are performance issues, which I have not investigated:

add_m10x1 inputs and output are passed by value.
The loop in add_m10x1 is run serially.

Said "performance issues" are of my own making and it is completely straightforward not to introduce them:

Parameters to add_m10x1 should be passed by reference (via pointers in C, references in C++)
The computation in add_m10x1 should be done in place.
add_m10x1 should be declared void and the return statement deleted. The result is returned via the first parameter.
The declare reduction pragma should be accordingly modified, the combiner should be just a function call and not an assignment (v4.0 specs p181 lines 9,10).
The for loop in add_m10x1 can be parallelized via an omp parallel for pragma
Parallel nesting should be enabled (e.g. via OMP_NESTED=TRUE)

The modified part of the code then is:

void add_m10x1(struct m10x1 * x,struct m10x1 * y){   int i;   #pragma omp parallel for   for (i=0;i<10;i++) x->v[i] += y->v[i]; }  #pragma omp declare reduction(m10x1Add: struct m10x1: \ add_m10x1(&omp_out, &omp_in)) initializer( \ omp_priv={{ 0,  0,  0,  0,  0,  0,  0,  0, 0,  0}} )

answered Sep 18 '22 13:09

NameOfTheRose

Since none of the other answers mentioned, I am adding this answer.

I am trying to parallelize the following program, but don't know how to reduce on an array. I know it is not possible to do so, but is there an > alternative?

With OpenMP 4.5 you can reduce array using pragmas, namely:

#pragma omp parallel for reduction(+:S)

A full running example:

#define S_SIZE 10
#include <stdio.h>
#include <time.h>
#include <omp.h>
int main ()
{
  int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13};
  int S [S_SIZE] = {0};

  #pragma omp parallel for reduction(+:S[:S_SIZE])
  for (int n=0 ; n<S_SIZE ; ++n ){
    for (int m=0; m<=n; ++m){
      S[n] += A[m];
    }
  }
  int expected_output [] = {84, 114, 209, 303, 339, 412, 464, 487, 489, 502};   
  for(int i = 0; i < S_SIZE; i++){
      if(S[i] == expected_output[i])
        printf("%d\n", S[i]);
     else
       printf("ERROR! it should have been %d instead of %d\n", expected_output[i], S[i]);
  }
  
  return 0;
}

Output:

answered Sep 20 '22 13:09

dreamcrash

Related questions
                            
                                Any good advice on using emacs for C++ project? [closed]
                            
                                Purpose of stdafx.h [duplicate]
                            
                                Legal to overwrite std::string's null terminator?
                            
                                Is it okay to define a totally general swap() function?
                            
                                Explicit copy constructor behavior and practical uses
                            
                                What is the NDEBUG preprocessor macro used for (on different platforms)?
                            
                                Is floating point addition commutative in C++?
                            
                                Prevent scientific notation in ostream when using << with double
                            
                                How to Create Restful Web Services using c++ language and JSON Parser
                            
                                Why not just use random_device?
                            
                                Why does unique_ptr instantiation compile to larger binary than raw pointer?
                            
                                What are customization point objects and how to use them?
                            
                                How do breakpoints work in C++ code?
                            
                                Should a move constructor take a const or non-const rvalue reference?
                            
                                How to initialize std::unique_ptr in constructor?
                            
                                What is the fastest hash function for pointers?
                            
                                Which C++ signals/slots library should I choose? [closed]
                            
                                macro definition containing #include directive
                            
                                Is there a way to cancel/detach a future in C++11?
                            
                                When should I use std::bind?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With