Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compile openmp into pthreads C code

I understand that OpenMP is in fact just a set of macros which is compiled into pthreads. Is there a way of seeing the pthread code before the rest of the compilation occurs? I am using GCC to compile.

like image 668
superbriggs Avatar asked Feb 14 '13 16:02

superbriggs


2 Answers

First, OpenMP is not a simple set of macros. It may be seen a simple transformation into pthread-like code, but OpenMP does require more than that including runtime support.

Back to your question, at least, in GCC, you can't see pthreaded code because GCC's OpenMP implementation is done in the compiler back-end (or middle-end). Transformation is done in IR(intermediate representation) level. So, from the viewpoint of programmers, it's not easy to see how the code is actually transformed.

However, there are some references.

(1) An Intel engineer provided a great overview of the implementation of OpenMP in Intel C/C++ compiler:

http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-1/226300148

http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-2/226300277

(2) You may take a look at the implementation of GCC's OpenMP:

https://github.com/mirrors/gcc/tree/master/libgomp

See libgomp.h does use pthread, and loop.c contains the implementation of parallel-loop construct.

like image 57
minjang Avatar answered Nov 14 '22 00:11

minjang


OpenMP is a set of compiler directives, not macros. In C/C++ those directives are implemented with the #pragma extension mechanism while in Fortran they are implemented as specially formatted comments. These directives instruct the compiler to perform certain code transformations in order to convert the serial code into parallel.

Although it is possible to implement OpenMP as transformation to pure pthreads code, this is seldom done. Large part of the OpenMP mechanics is usually built into a separate run-time library, which comes as part of the compiler suite. For GCC this is libgomp. It provides a set of high level functions that are used to easily implement the OpenMP constructs. It is also internal to the compiler and not intended to be used by user code, i.e. there is no header file provided.

With GCC it is possible to get a pseudocode representation of what the code looks like after the OpenMP transformation. You have to supply it the -fdump-tree-all option, which would result in the compiler spewing a large number of intermediate files for each compilation unit. The most interesting one is filename.017t.ompexp (this comes from GCC 4.7.1, the number might be different on other GCC versions, but the extension would still be .ompexp). This file contains an intermediate representation of the code after the OpenMP constructs were lowered and then expanded into their proper implementation.

Consider the following example C code, saved as fun.c:

void fun(double *data, int n)
{
   #pragma omp parallel for
   for (int i = 0; i < n; i++)
     data[i] += data[i]*data[i];
}

The content of fun.c.017t.ompexp is:

fun (double * data, int n)
{
  ...
  struct .omp_data_s.0 .omp_data_o.1;
  ...

<bb 2>:
  .omp_data_o.1.data = data;
  .omp_data_o.1.n = n;
  __builtin_GOMP_parallel_start (fun._omp_fn.0, &.omp_data_o.1, 0);
  fun._omp_fn.0 (&.omp_data_o.1);
  __builtin_GOMP_parallel_end ();
  data = .omp_data_o.1.data;
  n = .omp_data_o.1.n;
  return;
}

fun._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i)
{
  int n [value-expr: .omp_data_i->n];
  double * data [value-expr: .omp_data_i->data];
  ...

<bb 3>:
  i = 0;
  D.1637 = .omp_data_i->n;
  D.1638 = __builtin_omp_get_num_threads ();
  D.1639 = __builtin_omp_get_thread_num ();
  ...

<bb 4>:
  ... this is the body of the loop ...
  i = i + 1;
  if (i < D.1644)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 5>:

<bb 6>:
  return;

  ...
}

I have omitted big portions of the output for brevity. This is not exactly C code. It is a C-like representation of the program flow. <bb N> are the so-called basic blocks - collection of statements, treated as single blocks in the program's workflow. The first thing that one sees is that the parallel region gets extracted into a separate function. This is not uncommon - most OpenMP implementations do more or less the same code transformation. One can also observe that the compiler inserts calls to libgomp functions like GOMP_parallel_start and GOMP_parallel_end, which are used to bootstrap and then to finish the execution of a parallel region (the __builtin_ prefix is removed later on). Inside fun._omp_fn.0 there is a for loop, implemented in <bb 4> (note that the loop itself is also expanded). Also all shared variables are put into a special structure that gets passed to the implementation of the parallel region. <bb 3> contains the code that computes the range of iterations over which the current thread would operate.

Well, not quite a C code, but this is probably the closest thing that one can get from GCC.

like image 28
Hristo Iliev Avatar answered Nov 14 '22 01:11

Hristo Iliev