I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier
{
private:
std::atomic<int> counter[2];
std::atomic<int> lock[2];
std::atomic<int> cur_idx;
int thread_count;
public:
//constructors...
bool wait();
};
All members are initialized to zero, except thread_count, which holds the appropriate count. I have implemented the wait function as
int idx = cur_idx.load();
if(lock[idx].load() == 0)
{
lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
counter[idx].store(0);
cur_idx.fetch_xor(1);
lock[idx].store(0);
return true;
}
while(lock[idx].load() == 1);
return false;
However when trying to use it with two threads (thread_count
is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1);
, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int
instead of std::atomic<int>
and writing wait
as followed:
int idx = cur_idx;
if(lock[idx] == 0)
{
__sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
__sync_synchronize();
counter[idx] = 0;
cur_idx ^= 1;
__sync_synchronize();
lock[idx] = 0;
__sync_synchronize();
return true;
}
while(lock[idx] == 1);
return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
std::atomic
and there is a problem with the first variant (but not the second)For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2);
std::thread t([&b]()->void
{
std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
b.wait();
});
b.wait();
t.join();
Since mingw doesn't whave <thread>
headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping)
Any insights would be appreciated.
edit: Explanation for the algorithm to make it clearer:
thread_count
is the number of threads which shall wait for the barrier (so if thread_count
threads are in the barrier all can leave the barrier).lock
is set to one when the first (or any) thread enters the barrier.counter
counts how many threads are inside the barrier and is atomically incremented once for each threadif counter>=thread_count
all threads are inside the barrier so counter and lock are reset to zerolock
to become zerocounter
, lock
) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)edit2:
I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic
, but I'm still not completely convinced, since looking into the <atomic>
header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions
I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:
std::atomic<bool> consumerLock;
{ // the critical section
while (consumerLock.exchange(true)) { } // this is the spinlock
// do something useful
consumerLock = false; // unlock
}
In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag
. With that, the critical section would look like this:
std::atomic_flag consumerLock;
{
// critical section
while (consumerLock.test_and_set()) { /* spin */ }
// do stuff
consumerLock.clear();
}
(You can use acquire and release memory ordering there if you prefer.)
It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>
class spinning_barrier
{
public:
spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}
bool wait ()
{
unsigned int step = step_.load ();
if (nwait_.fetch_add (1) == n_ - 1)
{
/* OK, last thread to come. */
nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
step_.fetch_add (1);
return true;
}
else
{
/* Run in circles and scream like a little girl. */
while (step_.load () == step)
;
return false;
}
}
protected:
/* Number of synchronized threads. */
const unsigned int n_;
/* Number of threads currently spinning. */
std::atomic<unsigned int> nwait_;
/* Number of barrier syncronizations completed so far,
* it's OK to wrap. */
std::atomic<unsigned int> step_;
};
EDIT: @Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.
Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading.
struct bar_t {
unsigned const count;
std::atomic<unsigned> spaces;
std::atomic<unsigned> generation;
bar_t(unsigned count_) :
count(count_), spaces(count_), generation(0)
{}
void wait() {
unsigned const my_generation = generation;
if (!--spaces) {
spaces = count;
++generation;
} else {
while(generation == my_generation);
}
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With