Parallelize nested for loop with openMP

Q: Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

Q: Can you parallelize a while loop?

In addition, it is common for while loops to check for a condition using last iteration's result. This is called Read-after-Write or true-dependency and cannot be parallelized. Your slowdown problem might be alleviated if you try to minimize the number of omp parallel clauses.

Q: Is OpenMP multithreaded?

OpenMP 4.0 added support for offloading code to different devices, such as a GPU. Therefore there can be three layers of parallelism in a single program: Single thread processing multiple data; multiple threads running simultaneously; and multiple devices running same program simultaneously.

People also ask

How do I parallelize nested loops in OpenMP?

Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z(); This is all that we need most of the time.

Is nested parallelism possible in OpenMP?

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

Can you parallelize a while loop?

In addition, it is common for while loops to check for a condition using last iteration's result. This is called Read-after-Write or true-dependency and cannot be parallelized. Your slowdown problem might be alleviated if you try to minimize the number of omp parallel clauses.

Is OpenMP multithreaded?

OpenMP 4.0 added support for offloading code to different devices, such as a GPU. Therefore there can be three layers of parallelism in a single program: Single thread processing multiple data; multiple threads running simultaneously; and multiple devices running same program simultaneously.

I am trying to optimize the nested for loop in the function generate_histogram() below with openMP. I have tried around a lot with different combinations of pragmas based on what I've read in this SE post.

The problem is that the nested for loop performs faster without openMP than with openMP!

If I try to parallelize my code with reduction instead of the atomic pragma, I end up with netchunk fails. Does anybody know a fancy tweak for this one? I am trying to bin data into a histogram. So the histogram is variable in size in the real code, unlike in the snippet below.

#include<stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define float_t float
#include <time.h>
#include <omp.h>

float_t generate_histogram(float_t **matrix, int *histogram, int mat_size, int hist_size)
{
int i,j,k,count;
float_t max = 0.;
float_t sum;

//set histogram to zero everywhere
for(i = 0; i < hist_size; i++)
    histogram[i] = 0;


//matrix computations
#pragma omp parallel for private(i) shared(histogram,j,k,max) schedule(dynamic)
//#pragma omp parallel for schedule(runtime)
for (i = 1; i < (mat_size-1); i++)
{
    #pragma omp parallel for private(j,k) shared(histogram,max) schedule(dynamic)
    //pragma omp prallel for schedule(dynamic)
    for(j = 1; j < (mat_size-1); j++)
    {

        //assign current matrix[i][j] to element in order to reduce memory access
        sum = fabs(matrix[i][j]-matrix[i-1][j]) + fabs(matrix[i][j] - matrix[i+1][j])
            + fabs(matrix[i][j]-matrix[i][j-1]) + fabs(matrix[i][j] - matrix[i][j+1]);

        //compute index of histogram bin
        k = (int)(sum * (float)mat_size);
        #pragma omp atomic
        histogram[k] += 1;

        //keep track of largest element
        if(sum > max)
            max = sum;

    }//end inner for
}//end outer for

return max;
}


main()
{
int i,j,N,boxes;
N = 10000;
float_t **matrix;
int* histogram;
boxes = N / 2;

//allocate a matrix with some numbers
matrix = calloc(N, sizeof(float_t **));
for(i = 0; i < N; i++)
    matrix[i] = calloc(N, sizeof(float_t *));
for(i = 0; i < N; i++)
    for(j = 0; j < N; j++)
        matrix[i][j] = 1./(float_t) N * (float_t) i;


histogram = malloc(boxes * sizeof(int));

generate_histogram(matrix, histogram, N, boxes);

}

Related questions
                            
                                Iterator of a wildcard type variable with upper bound
                            
                                Alternatives to exec/eval?
                            
                                ASP.NET WEB API not binding to dynamic object on POST
                            
                                How is str implemented in Python?
                            
                                Determining implementation of method based on available constraints
                            
                                How to split F[A \/ B] into (F[A], F[B])
                            
                                JavaScript - How to hide global scope from eval'd script
                            
                                Add text between two patterns in File using sed command
                            
                                clojure app config files - spit output of pprint s-expression?
                            
                                Is there any diference between defining static function inside anonymous namespace and outside?
                            
                                Apache CXF, generating web services from WSDL file [closed]
                            
                                Template Does not exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallelize nested for loop with openMP

Tags:

People also ask

Recent Activity

Donate For Us