How to use omp parallel for and omp simd together?

I want to test #pragma omp parallel for and #pragma omp simd for a simple matrix addition program. When I use each of them separately, I get no error and it seems fine. But, I want to test how much performance can be gained using both of them. If I use #pragma omp parallel for before the outer loop and #pragma omp simd before the inner loop I get no error as well. The error occures when I use both of them before the outer loop. I get an error at runtime not compile time. ICC and GCC return error but Clang doesn't. It might be because Clang regect the parallelization. In my experiments, Clang does not parallelize and run the program with only one thread.

The program is here:

#include <stdio.h>
//#include <x86intrin.h>
#define N 512
#define M N

int __attribute__(( aligned(32))) a[N][M],
    __attribute__(( aligned(32))) b[N][M],
    __attribute__(( aligned(32))) c_result[N][M];

int main()
{
    int i, j;
    #pragma omp parallel for
    #pragma omp simd
    for( i=0;i<N;i++){
        for(j=0;j<M;j++){
            c_result[i][j]= a[i][j] + b[i][j];
        }
    }

    return 0;
}

The error for: ICC:

IMP1.c(20): error: omp directive is not followed by a parallelizable for loop #pragma omp parallel for ^

compilation aborted for IMP1.c (code 2)

GCC:

IMP1.c: In function ‘main’:

IMP1.c:21:10: error: for statement expected before ‘#pragma’ #pragma omp simd

Because in my other testes pragma omp simd for outer loop gets better performance I need to put that there (don't I?).

Platform: Intel Core i7 6700 HQ, Fedora 27

Tested compilers: ICC 18, GCC 7.2, Clang 5

Compiler command line:

icc -O3 -qopenmp -xHOST -no-vec

gcc -O3 -fopenmp -march=native -fno-tree-vectorize -fno-tree-slp-vectorize

clang -O3 -fopenmp=libgomp -march=native -fno-vectorize -fno-slp-vectorize

What is the difference between OMP for and OMP parallel for?

#pragma omp parallel spawns a group of threads, while #pragma omp for divides loop iterations between the spawned threads.

Does OpenMP use SIMD?

OpenMP SIMD, introduced in the OpenMP 4.0 standard, targets making vector-friendly loops. By using the simd directive before a loop, the compiler can ignore vector dependencies, make the loop as vector-friendly as possible, and respect the users' intention to have multiple loop iterations executed simultaneously.

Is there an implicit barrier after #pragma OMP parallel region?

Yes, "There is an implicit barrier at the end of the parallel construct." OpenMP Standard 4.5, 1.3 Execution Model, page 15.

Is OpenMP parallel or concurrent?

OpenMP will: Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.

From OpenMP 4.5 Specification:

2.11.4 Parallel Loop SIMD Construct

The parallel loop SIMD construct is a shortcut for specifying a parallel construct containing one loop SIMD construct and no other statement.

The syntax of the parallel loop SIMD construct is as follows:

#pragma omp parallel for simd ...

You can also write:

#pragma omp parallel
{
   #pragma omp for simd
   for ...
}

How to use omp parallel for and omp simd together?

Tags:

c

x86

parallel-processing

simd

openmp

Hossein Amiri

People also ask

1 Answers

Daniel Langr

Recent Activity

Donate For Us

How to use omp parallel for and omp simd together?

Tags:

c

x86

parallel-processing

simd

openmp

Hossein Amiri

People also ask

1 Answers

Daniel Langr

Related questions

Recent Activity

Donate For Us