Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread synchronization problem with c++ std::atomic variables

Tags:

c++11

atomic

The following program is giving me unexpected behavior when it's printing the "bad" output once in a while. The two threads are supposed to synchronize using the two std::atomic variables 's_lock1' and 's_lock2'. In func2, in order to set the 's_var' variable to 1, it must have atomically stored a non-zero value in 's_lock2' and the other thread (func1) must not have updated the 's_lock1' variable yet. However, somehow in func1 it's printing the unexpected "bad" output. The s_lock2.load() statement seems to return false instead. Is there something wrong with this code snippet? Is it an issue related to memory ordering?

I am running this on an 8-core Linux server with Centos 7 installed. Any help is greatly appreciated.

#include <iostream>
#include <thread>
#include <atomic>
#include <unistd.h>

std::atomic_uint s_lock1 = 0;
std::atomic_uint s_lock2 = 0;
std::atomic_uint s_var = 0;

static void func1()
{
    while (true) {
        s_lock1.store(1, std::memory_order_release);
        if (s_lock2.load(std::memory_order_acquire) != 0) {
            s_lock1.store(0, std::memory_order_release);
            continue;
        }
        if (s_var.load(std::memory_order_acquire) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_release);
    }
}

static void func2()
{
    while (true) {
        s_lock2.store(1, std::memory_order_release);
        if (s_lock1.load(std::memory_order_acquire) != 0) {
            s_lock2.store(0, std::memory_order_release);
            continue;
        }
        s_var.store(1, std::memory_order_release);
        usleep(5000);
        s_var.store(0, std::memory_order_release);
        s_lock2.store(0, std::memory_order_release);
    }
}

int main()
{
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}
like image 538
freedo Avatar asked May 14 '19 11:05

freedo


1 Answers

This locking algorithm may break because of the store buffers in Intel CPUs: the stores do not go into level 1 cache directly but are queued in the store buffer for a while and hence are invisible to another CPU during that time:

To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems.

The store buffers need to be flushed for this locking to work by using std::memory_order_seq_cst for stores to locks (the default memory order for loads and stores, you can just do s_lock1 = 1;, for example). std::memory_order_seq_cst for stores causes the compiler generate xchg instruction or insert mfence instruction after the store, both of which make the effect of the store visible to other CPUs:

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order. Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.

Working example:

std::atomic<unsigned> s_lock1{0};
std::atomic<unsigned> s_lock2{0};
std::atomic<unsigned> s_var{0};

void func1() {
    while(true) {
        s_lock1.store(1, std::memory_order_seq_cst);
        if(s_lock2.load(std::memory_order_seq_cst) != 0) {
            s_lock1.store(0, std::memory_order_seq_cst);
            continue;
        }
        if(s_var.load(std::memory_order_relaxed) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_seq_cst);
    }
}

void func2() {
    while(true) {
        s_lock2.store(1, std::memory_order_seq_cst);
        if(s_lock1.load(std::memory_order_seq_cst) != 0) {
            s_lock2.store(0, std::memory_order_seq_cst);
            continue;
        }
        s_var.store(1, std::memory_order_relaxed);
        usleep(5000);
        s_var.store(0, std::memory_order_relaxed);
        s_lock2.store(0, std::memory_order_seq_cst);
    }
}

int main() {
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}
like image 75
Maxim Egorushkin Avatar answered Oct 21 '22 13:10

Maxim Egorushkin