I've been struggling a weird problem the last few days. We create some libraries using GCC 4.8 which link some of their dependencies statically - eg. log4cplus or boost. For these libraries we have created Python bindings using boost-python.
Every time such a library used TLS (like log4cplus does in it's static initialization or stdlibc++ does when throwing an exception - not only during initialization phase) the whole thing crashed in a segfault - and every time the address of the thread local variable has been 0.
I tried everything like recompiling, ensuring -fPIC is used, ensuring -tls-model=global-dynamic is used, etc. No success. Then today I found out that the reason for these crashes has been our way of linking OpenMP in. We did this using "-lgomp" instead of just using "-fopenmp". Since I changed this everything works fine - no crashes, no nothing. Fine!
But I'd really like to know what the cause of the problem was. So what's the difference between these two possibilities to link in OpenMP?
We have a CentOS 5 machine here where we have installed a GCC-4.8 in /opt/local/gcc48 and we are also sure that the libgomp coming from /opt/local/gcc48 had been used as well as the libstdc++ from there (DL_DEBUG used).
Any ideas? Haven't found anything on Google - or I used the wrong keywords :)
Parallel code with OpenMP marks, through a special directive, sections to be executed in parallel. The part of the code that’s marked to run in parallel will cause threads to form. The main tread is the master thread. The slave threads all run in parallel and run the same code.
A shared variable has the same address in the execution context of every thread. All threads have access to shared variables. A private variable has a different address in the execution context of every thread. Example.
OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.
OpenMP is an intermediary between your code and its execution. Each #pragma omp
statement are converted to calls to their according OpenMP library function, and it's all there is to it. The multithreaded execution (launching threads, joining and synchronizing them, etc.) is always handled by the Operating System (OS). All OpenMP does is handling these low-level OS-dependent threading calls for us portably in a short and sweet interface.
The -fopenmp
flag is a high-level one that does more than include GCC's OpenMP implementation (gomp). This gomp library will require more libraries to access the threading functionality of the OS. On POSIX-compliant OSes, OpenMP is usually based on pthread, which needs to be linked. It may also need the realtime extension library (librt) to work on some OSes, while not on some other. When using dynamic linking, everything should be discovered automatically, but when you specified -static
, I think you've fallen in the situation described by Jakub Jelinek here. But nowadays, pthread (and rt if needed) should be automatically linked when -static
is used.
Aside from linking dependencies, the -fopenmp
flag also activates some pragma statement processing. You can see throughout the GCC code (as here and here) that without the -fopenmp
flag (which isn't trigged by only linking the gomp library), multiple pragmas won't be converted to the appropriate OpenMP function call. I just tried with some example code, and both -lgomp
and -fopenmp
produce a working executable that links against the same libraries. The only difference in my simple example that the -fopenmp
has a symbol that the -lgomp
doesn't have: GOMP_parallel@@GOMP_4.0+
(code here) which is the function that initializes the parallel section performing the forks requested by the #pragma omp parallel
in my example code. Thus, the -lgomp
version did not translate the pragma to a call to GCC's OpenMP implementation. Both produced a working executable, but only the -fopenmp
flag produced a parallel executable in this case.
To wrap up, -fopenmp
is needed for GCC to process all the OpenMP pragmas. Without it, your parallel sections won't fork any thread, which could wreak havoc depending on the assumptions on which your inner code was done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With