I have some worker threads performing time-critical processing at regular intervals (approx 1 kHz). Each cycle, the workers are woken up to do a chore, each of which should (on average) be complete before the next cycle begins. They operate on the same object, which can be occasionally modified by the main thread.
To prevent races, but allow a modification of the object to occur before the next cycle, I have used a spin-lock along with an atomic counter to record how many threads are still doing work:
class Foo {
public:
void Modify();
void DoWork( SomeContext& );
private:
std::atomic_flag locked = ATOMIC_FLAG_INIT;
std::atomic<int> workers_busy = 0;
};
void Foo::Modify()
{
while( locked.test_and_set( std::memory_order_acquire ) ) ; // spin
while( workers_busy.load() != 0 ) ; // spin
// Modifications happen here ....
locked.clear( std::memory_order_release );
}
void Foo::DoWork( SomeContext& )
{
while( locked.test_and_set( std::memory_order_acquire ) ) ; // spin
++workers_busy;
locked.clear( std::memory_order_release );
// Processing happens here ....
--workers_busy;
}
This allows all remaining work to be completed immediately, provided at least one thread has begun, and will always block before another worker can begin work for the next cycle.
The atomic_flag
is accessed with "acquire" and "release" memory orders, as appears to be an accepted way of implementing spin-locks with C++11. According to documentation at cppreference.com:
memory_order_acquire
: A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load. This ensures that all writes in other threads that release the same atomic variable are visible in the current thread.
memory_order_release
: A store operation with this memory order performs the release operation: no memory accesses in the current thread can be reordered after this store. This ensures that all writes in the current thread are visible in other threads that acquire the same atomic variable and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic.
As I understand the above, this is enough to synchronise protected accesses across threads to provide mutex behaviour, without being overly conservative about memory ordering.
What I wish to know is whether the memory ordering can be further relaxed because a side-effect of this pattern is that I'm using a spin-lock mutex to synchronise another atomic variable.
The calls to ++workers_busy
, --workers_busy
and workers_busy.load()
all currently have the default memory order, memory_order_seq_cst
. Given that the only interesting use for this atomic is to unblock Modify()
with --workers_busy
(which is not synchronised by the spin-lock mutex), could the same acquire-release memory order be used with this variable, using a "relaxed" increment? i.e.
void Foo::Modify()
{
while( locked.test_and_set( std::memory_order_acquire ) ) ;
while( workers_busy.load( std::memory_order_acquire ) != 0 ) ; // <--
// ....
locked.clear( std::memory_order_release );
}
void Foo::DoWork( SomeContext& )
{
while( locked.test_and_set( std::memory_order_acquire ) ) ;
workers_busy.fetch_add( 1, std::memory_order_relaxed ); // <--
locked.clear( std::memory_order_release );
// ....
workers_busy.fetch_sub( 1, std::memory_order_release ); // <--
}
Is this correct? Is it possible for any of these memory orderings to be relaxed further? And does it even matter?
Spinlock. Implementation Usage Discussion. The purpose of a spin lock is to prevent multiple threads from concurrently accessing a shared data structure. In contrast to a mutex, threads will busy-wait and waste CPU cycles instead of yielding the CPU to another thread.
A spinlock implementation requires a special kind of instruction known as a read-modify-write (RMW) instruction. These expensive operations are useful because they act atomically preventing a data race in multithreaded kernels.
Since you say you're targeting x86 only, you're guaranteed strongly-ordered memory anyway; avoiding memory_order_seq_cst
is useful (it can trigger expensive and unnecessary memory fences), but beyond that, most other operations won't impose any special overhead, so you won't gain anything from additional relaxation aside from allowing possibly incorrect compiler instruction reordering. This should be safe, and no slower than any other solution using C++11 atomics:
void Foo::Modify()
{
while( locked.test_and_set( std::memory_order_acquire ) ) ;
while( workers_busy.load( std::memory_order_acquire ) != 0 ) ; // acq to see decrements
// ....
locked.clear( std::memory_order_release );
}
void Foo::DoWork( SomeContext& )
{
while(locked.test_and_set(std::memory_order_acquire)) ;
workers_busy.fetch_add(1, std::memory_order_relaxed); // Lock provides acq and rel free
locked.clear(std::memory_order_release);
// ....
workers_busy.fetch_sub(1, std::memory_order_acq_rel); // No lock wrapping; acq_rel
}
At worst, on x86, this imposes some compiler ordering constraints; it shouldn't introduce additional fences or lock instructions that don't need to be locked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With