I've written a container for a very simple piece of data that needs to be synchronized across threads. I want the top performance. I don't want to use locks.
I want to use "relaxed" atomics. Partly for that little bit of extra oomph, and partly to really understand them.
I've been working on this a lot, and I'm at the point where this code passes all tests I throw at it. That's not quite "proof" though, and so I'm wondering if there's anything I'm missing, or any other ways I can test this?
Here's my premise:
Here's what I'm thinking. "Normally", the way we reason about code that we're reading is to look at the order in which it's written. Memory can be read or written to "out of order", but not in a way that invalidates the correctness of the program.
That changes in a multi-threaded environment. That's what memory fences are for - so that we can still look at the code and be able to reason about how it's going to work.
So if everything can go all out-of-order here, what am I doing with relaxed atomics? Isn't that a bit too far?
I don't think so, but that's why I'm here asking for help.
The compare_exchange operations themselves give a guarantee of sequential constancy with each other.
The only other time there is read or write to an atomic is to get the head's initial value before a compare_exchange. It is set as part of the initialization of a variable. As far as I can tell, it would be irrelevant whether or not this operation brings back a "proper" value.
Current code:
struct node
{
node *n_;
#if PROCESSOR_BITS == 64
inline constexpr node() : n_{ nullptr } { }
inline constexpr node(node* n) : n_{ n } { }
inline void tag(const stack_tag_t t) { reinterpret_cast<stack_tag_t*>(this)[3] = t; }
inline stack_tag_t read_tag() { return reinterpret_cast<stack_tag_t*>(this)[3]; }
inline void clear_pointer() { tag(0); }
#elif PROCESSOR_BITS == 32
stack_tag_t t_;
inline constexpr node() : n_{ nullptr }, t_{ 0 } { }
inline constexpr node(node* n) : n_{ n }, t_{ 0 } { }
inline void tag(const stack_tag_t t) { t_ = t; }
inline stack_tag_t read_tag() { return t_; }
inline void clear_pointer() { }
#endif
inline void set(node* n, const stack_tag_t t) { n_ = n; tag(t); }
};
using std::memory_order_relaxed;
class stack
{
public:
constexpr stack() : head_{}{}
void push(node* n)
{
node next{n}, head{head_.load(memory_order_relaxed)};
do
{
n->n_ = head.n_;
next.tag(head.read_tag() + 1);
} while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));
}
bool pop(node*& n)
{
node clean, next, head{head_.load(memory_order_relaxed)};
do
{
clean.set(head.n_, 0);
if (!clean.n_)
return false;
next.set(clean.n_->n_, head.read_tag() + 1);
} while (!head_.compare_exchange_weak(head, next, memory_order_relaxed, memory_order_relaxed));
n = clean.n_;
return true;
}
protected:
std::atomic<node> head_;
};
What's different about this question compared to others? Relaxed atomics. They make a big difference to the question.
So, what do you think? Is there anything I'm missing?
push
is broken, since you do not update node->_next
after a compareAndSwap
failure. It's possible that the node you originally stored with node->setNext
has been popped from the top of stack by another thread when the next compareAndSwap
attempt succeeds. As a result, some thread thinks it has popped a node from the stack but this thread has put it back in the stack. It should be:
void push(Node* node) noexcept
{
Node* n = _head.next();
do {
node->setNext(n);
} while (!_head.compareAndSwap(n, node));
}
Also, since next
and setNext
use memory_order_relaxed
, there's no guarantee that _head_.next()
here is returning the node most recently pushed. It's possible to leak nodes from the top of the stack. The same problem obviously exists in pop
as well: _head.next()
may return a node that was previously but is no longer at the top of the stack. If the returned value is nullptr
, you may fail to pop when the stack is not actually empty.
pop
can also have undefined behavior if two threads try to pop the last node from the stack at the same time. They both see the same value for _head.next()
, one thread successfully completes pop. The other thread enters the while loop - since the observed node pointer is not nullptr
- but the compareAndSwap
loop soon updates it to nullptr
since the stack is now empty. On the next iteration of the loop, that nullptr is dererenced to get its _next
pointer and much hilarity ensues.
pop
is also clearly suffering from ABA. Two threads can see the same node at the top of the stack. Say one thread gets to the point of evaluating the _next
pointer and then blocks. The other thread successfully pops the node, pushes 5 new nodes, and then pushes that original node again all before the other thread wakes. That other thread's compareAndSwap
will succeed - the top-of-stack node is the same - but store the old _next
value into _head
instead of the new one. The five nodes pushed by the other thread are all leaked. This would be the case with memory_order_seq_cst
as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With