I'd like to parallelize this function but I'm new with open mp and I'd be grateful if someone could help me :
void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp){
float t=0;
for(int r=0;r<nbNeurons;r++){
t+=p[r];
}
for(int i=0;i<nbOutput;i++){
float coef=0;
for(int r=0;r<nbNeurons;r++){
coef+=p[r]*A[r][i];
}
amp[i]=coef/t;
}
}
I don't know how to parallelize it properly because of the double loop for, for the moment, I only thought about doing a :
#pragma omp parallel for reduction(+:t)
But I think it is not the best way to get the computing faster through openMp.
Thank in advance,
First of all: we need to know context. Where does your profiler tell you the most time is spent?
In general, coarse grained parallellization works best, so as @Alex said: parallellize the outer for loop.
void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp)
{
float t=0;
for(int r=0;r<nbNeurons;r++)
t+=p[r];
#pragma parallel omp for
for(int i=0;i<nbOutput;i++){
float coef=0;
for(int r=0;r<nbNeurons;r++){
coef+=p[r]*A[r][i];
}
amp[i]=coef/t;
}
}
Depending on the actual volumes, it may be interesting to calculate t in the background, and move the division out of the parallel loop:
void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp)
{
float t=0;
#pragma omp parallel shared(amp)
{
#pragma omp single nowait // only a single thread executes this
{
for(int r=0;r<nbNeurons;r++)
t+=p[r];
}
#pragma omp for
for(int i=0;i<nbOutput;i++){
float coef=0;
for(int r=0;r<nbNeurons;r++){
coef+=p[r]*A[r][i];
}
amp[i]=coef;
}
#pragma omp barrier
#pragma omp master // only a single thread executes this
{
for(int i=0; i<nbOutput; i++){
amp[i] /= t;
}
}
}
}
Note untested code. OMP has tricky semantics sometimes, so I might have missed a 'shared' declaration there. Nothing a profiler won't quickly notify you about, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With