I’m using OpenMP and need to use the fetch-and-add operation. However, OpenMP doesn’t provide an appropriate directive/call. I’d like to preserve maximum portability, hence I don’t want to rely on compiler intrinsics.
Rather, I’m searching for a way to harness OpenMP’s atomic operations to implement this but I’ve hit a dead end. Can this even be done? N.B., the following code almost does what I want:
#pragma omp atomic
x += a
Almost – but not quite, since I really need the old value of x
. fetch_and_add
should be defined to produce the same result as the following (only non-locking):
template <typename T>
T fetch_and_add(volatile T& value, T increment) {
T old;
#pragma omp critical
{
old = value;
value += increment;
}
return old;
}
(An equivalent question could be asked for compare-and-swap but one can be implemented in terms of the other, if I’m not mistaken.)
As of openmp 3.1 there is support for capturing atomic updates, you can capture either the old value or the new value. Since we have to bring the value in from memory to increment it anyways, it only makes sense that we should be able to access it from say, a CPU register and put it into a thread-private variable.
There's a nice work-around if you're using gcc (or g++), look up atomic builtins: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
It think Intel's C/C++ compiler also has support for this but I haven't tried it.
For now (until openmp 3.1 is implemented), I've used inline wrapper functions in C++ where you can choose which version to use at compile time:
template <class T>
inline T my_fetch_add(T *ptr, T val) {
#ifdef GCC_EXTENSION
return __sync_fetch_and_add(ptr, val);
#endif
#ifdef OPENMP_3_1
T t;
#pragma omp atomic capture
{ t = *ptr; *ptr += val; }
return t;
#endif
}
Update: I just tried Intel's C++ compiler, it currently has support for openmp 3.1 (atomic capture is implemented). Intel offers free use of its compilers in linux for non-commercial purposes:
http://software.intel.com/en-us/articles/non-commercial-software-download/
GCC 4.7 will support openmp 3.1, when it eventually is released... hopefully soon :)
If you want to get old value of x and a is not changed, use (x-a) as old value:
fetch_and_add(int *x, int a) {
#pragma omp atomic
*x += a;
return (*x-a);
}
UPDATE: it was not really an answer, because x can be modified after atomic by another thread. So it's seems to be impossible to make universal "Fetch-and-add" using OMP Pragmas. As universal I mean operation, which can be easily used from any place of OMP code.
You can use omp_*_lock
functions to simulate an atomics:
typedef struct { omp_lock_t lock; int value;} atomic_simulated_t;
fetch_and_add(atomic_simulated_t *x, int a)
{
int ret;
omp_set_lock(x->lock);
x->value +=a;
ret = x->value;
omp_unset_lock(x->lock);
}
This is ugly and slow (doing a 2 atomic ops instead of 1). But If you want your code to be very portable, it will be not the fastest in all cases.
You say "as the following (only non-locking)". But what is the difference between "non-locking" operations (using CPU's "LOCK" prefix, or LL/SC or etc) and locking operations (which are implemented itself with several atomic instructions, busy loop for short wait of unlock and OS sleeping for long waits)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With