Consider the following snippet:
#include <map>
class A {
static std::map<int,int> theMap;
#pragma omp threadprivate(theMap)
};
std::map<int,int> A::theMap;
Compilation with OpenMP fails with the following error message:
$ g++ -fopenmp -c main.cpp
main.cpp:5:34: error: ‘threadprivate’ ‘A::theMap’ has incomplete type
I don't understand this. I can compile without the #pragma
directive, which should mean that std::map
is not incomplete. I can also compile if theMap is a primitive type (double, int...).
How do I make a global static std::map
threadprivate
?
SummaryThe threadprivate directive specifies that variables are replicated, with each thread having its own copy. The threadprivate directive is a declarative directive.
#pragma omp parallel spawns a group of threads, while #pragma omp for divides loop iterations between the spawned threads. You can do both things at once with the fused #pragma omp parallel for directive.
The THREADPRIVATE directive allows you to specify named common blocks and named variables as private to a thread but global within that thread. Once you declare a common block or variable THREADPRIVATE, each thread in the team maintains a separate copy of that common block or variable.
This is a compiler restriction. Intel C/C++ compiler supports C++ classes on threadprivate
while gcc and MSVC currently cannot.
For example, in MSVC (VS 2010), you will get this error (I removed the class):
static std::map<int,int> theMap;
#pragma omp threadprivate(theMap)
error C3057: 'theMap' : dynamic initialization of 'threadprivate' symbols is not currently supported
So, the workaround is pretty obvious, but dirty. You need to make a very simple thread-local storage. A simple approach would be:
const static int MAX_THREAD = 64;
struct MY_TLS_ITEM
{
std::map<int,int> theMap;
char padding[64 - sizeof(theMap)];
};
__declspec(align(64)) MY_TLS_ITEM tls[MAX_THREAD];
Note that the reason why I have padding is to avoid false sharing. I assume that 64-byte cache line for modern Intel x86 processors. __declspec(align(64))
is a MSVC extension that the structure is on the boundary of 64. So, any elements in tls
will be located on a different cache line, resulting in no false sharing. GCC has __attribute__ ((aligned(64)))
.
In order to access this simple TLS, you can do this:
tls[omp_get_thread_num()].theMap;
Of course, you should call this inside one of OpenMP parallel constructs. The nice thing is that OpenMP provides an abstracted thread ID in [0, N), where N is the maximum thread number. This enables a fast and simple TLS implementation. In general, a native TID from operating system is an arbitrary integer number. So, you mostly need to have a hash table whose access time is longer than a simple array.
The incomplete type error is a bug in the compiler which can be worked around by instantiating std::map<int,int>
before the threadprivate directive. But once you get past that issue GCC 4.7 still doesn't support dynamic initialization of threadprivate variables. This will be supported in GCC 4.8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With