I have a bundle of floats which get updated by various threads. Size of the array is much larger than the number of threads. Therefore simultaneous access on particular floats is rather rare. I need a solution for C++03.
The following code atomically adds a value to one of the floats (live demo). Assuming it works it might be the best solution. The only alternative I can think of is dividing the array into bunches and protecting each bunch by a mutex. But I don't expect the latter to be more efficient.
My questions are as follows. Are there any alternative solutions for adding floats atomically? Can anyone anticipate which is the most efficient? Yes, I am willing to do some benchmarks. Maybe the solution below can be improved by relaxing the memorder constraints, i.e. exchanging __ATOMIC_SEQ_CST
by something else. I have no experience with that.
void atomic_add_float( float *x, float add )
{
int *ip_x= reinterpret_cast<int*>( x ); //1
int expected= __atomic_load_n( ip_x, __ATOMIC_SEQ_CST ); //2
int desired;
do {
float sum= *reinterpret_cast<float*>( &expected ) + add; //3
desired= *reinterpret_cast<int*>( &sum );
} while( ! __atomic_compare_exchange_n( ip_x, &expected, desired, //4
/* weak = */ true,
__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST ) );
}
This works as follows. At //1
the bit-pattern of x
is interpreted as an int
, i.e. I assume that float
and int
have the same size (32 bits). At //2
the value to be increased is loaded atomically. At //3
the bit-pattern of the int
is interpreted as float
and the summand is added. (Remember that expected
contains a value found at ip_x == x
.) This doesn't change the value under ip_x == x
. At //4
the result of the summation is stored only at ip_x == x
if no other thread changed the value, i.e. if expected == *ip_x
(docu). If this is not the case the do-loop continues and expected
contains the updated value found ad ip_x == x
.
GCC's functions for atomic access (__atomic_load_n
and __atomic_compare_exchange_n
) can easily be exchanged by other compiler's implementations.
Are there any alternative solutions for adding floats atomically? Can anyone anticipate which is the most efficient?
Sure, there are at least few that come to mind:
Use synchronization primitives, i.e. spinlocks. Will be a bit slower than compare-exchange.
Transactional extension (see Wikipedia). Will be faster, but this solution might limit the portability.
Overall, your solution is quire reasonable: it is fast and yet will work on any platform.
In my opinion the needed memory orders are:
__ATOMIC_ACQUIRE
-- when we read the value in __atomic_load_n()
__ATOMIC_RELEASE
-- when __atomic_compare_exchange_n()
is success__ATOMIC_ACQUIRE
-- when __atomic_compare_exchange_n()
is failedIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With