Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronization riddle with std::memory_order and three threads

Tags:

c++

c++11

atomic

Here is a problem about the std::memory_order rules in C++11, when it comes to three threads. Say, one thread producer saves a value and sets a flag. Then, another thread relay waits for this flag before setting another flag. Finally, a third thread consumer waits for the flag from relay, which should signal that data is ready for consumer.

Here is a minimal program, in the style of the examples in the C++ reference (http://en.cppreference.com/w/cpp/atomic/memory_order):

#include <thread>
#include <atomic>
#include <cassert>

std::atomic<bool> flag1 = ATOMIC_VAR_INIT(false);
std::atomic<bool> flag2 = ATOMIC_VAR_INIT(false);
int data;

void producer()
{
  data = 42;
  flag1.store(true, std::memory_order_release);
}

void relay_1()
{
  while (!flag1.load(std::memory_order_acquire))
    ;
  flag2.store(true, std::memory_order_release);
}

void relay_2()
{
  while (!flag1.load(std::memory_order_seq_cst))
    ;
  flag2.store(true, std::memory_order_seq_cst);
}

void relay_3()
{
  while (!flag1.load(std::memory_order_acquire))
    ;
  // Does the following line make a difference?
  data = data;
  flag2.store(true, std::memory_order_release);
}

void consumer()
{
  while (!flag2.load(std::memory_order_acquire))
    ;
  assert(data==42);
}

int main()
{
  std::thread a(producer);
  std::thread b(relay_1);
  std::thread c(consumer);
  a.join(); b.join(); c.join();
}

Comments:

  1. The first function relay_1() is insufficient and can trigger the assert in consumer. According to the C++ reference cited above, the memory_order_acquire keyword “ensures that all writes in other threads that release the same atomic variable are visible in the current thread”. So, the data=42 is visible to relay when it sets flag2. It sets it with memory_order_release, which “ensures that all writes in the current thread are visible in other threads that acquire the same atomic variable”. However, data has not been touched by relay, so the consumer might see memory accesses in a different order, and data may be uninitialized when consumer sees flag2==True.

  2. The same argument applies to the stricter memory orderings in relay_2(). The sequentially-consistent ordering implies that “the synchronization is established between all atomic operations tagged std::memory_order_seq_cst”. However, that doesn't say anything about the variable data.

    Or do I understand something wrong here and relay_2() is sufficient?

  3. Let's resolve the situation by accesses to data as in relay_3(). Here, the line data = data implies that data is read after flag1 went to true, and the relay thread writes into data, before it sets flag2. Hence, the consumer thread must see the correct value.

    However, this solution seems a little strange. The line data = data seems something that a compiler would (in sequential code) immediately optimize out.

    Does the dummy line do the trick here? Any better way to achieve synchronization across three threads with the C++11 std::memory_order features?

By the way, this is not an academic question. Imagine that data is a big block of data instead of a single integer, and the i-th thread needs to pass on the information to the (i+1)-th thread up to which element the data has been processed by all threads with index ≤i.

Edit:

After reading Michael Burr's answer it is clear that relay_1() is sufficient. Please read his post for a completely satisfying solution to the problem. The C++11 standard gives stricter guarantees than what can be inferred from the cppreference.com web site alone. Therefore, consider the argumentation in Michael Burr's post as authoritative, not my comments above. The way to go is to establish an “inter-thread happens-before” relation (which is transitive) between the events in question.

like image 756
Daniel Avatar asked Apr 19 '13 23:04

Daniel


1 Answers

I think that relay_1() is sufficient to pass the value 42 from the producer to the consumer through data.

To show this, first I'll give single-letter names to the operations of interest:

void producer()
{
    /* P */ data = 42;
    /* Q */ flag1.store(true, std::memory_order_release);
}

void relay_1()
{
  while (/* R */ !flag1.load(std::memory_order_acquire))
    ;

  /* S */ flag2.store(true, std::memory_order_release);
}


void consumer()
{
  while (/* T */ !flag2.load(std::memory_order_acquire))
    ;
  /* U */ assert(data==42);
}

I'm going to use the notation A -> B to mean that "A inter-thread happens before B" (C++11 1.10/11).

I argue that P is a visible side-effect with respect to U because:

  • P is sequenced before Q, R is sequenced before S, and T is sequenced before U (1.9/14)
  • Q synchronizes with R, and S synchronizes with T (29.3/2)

All of the next points are supported by the definition of "inter-thread happens before" (1.10/11):

  • Q -> S since the standard says "A inter-thread happens before an evaluation B if ... for some evaluation X, A synchronizes with X and X is sequenced before B" (Q synchronizes with R and R is sequenced before S, so Q -> S)

  • S -> U following similar logic (S synchronizes with T and T is sequenced before U, so S -> U)

  • Q -> U because Q -> S and S -> U ("A inter-thread happens before an evaluation B if ... A inter-thread happens before X and X inter-thread happens before B")

And finally,

  • P -> U because P is sequenced before Q and Q -> U ("A inter-thread happens before an evaluation B if ... A is sequenced before X and X inter-thread happens before B")

Since P inter-thread happens before U, P happens before U (1.10/12) and P is a "visible side-effect" with respect to U (1.10/13).

relay_3() is also sufficient because the data=data expression is irrelevant.

And for the purpose of this producer/consumer problem, relay_2() is at least as good as relay_1() because in a store operation memory_order_seq_cst is a release and in a load operation memory_order_seq_cst is an acquire (See 29.3/1). So the exact same logic can be followed. Operations using memory_order_seq_cst have some additional properties having to do with how all memory_order_seq_cst are sequenced among other memory_order_seq_cst operations, but those properties don't come into play in this example.

I think that memory_order_acquire and memory_order_release would not be very useful for implementing higher level synchronization objects if there weren't a transitive behavior such as this.

like image 161
Michael Burr Avatar answered Nov 08 '22 09:11

Michael Burr