Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mix atomic and non-atomic operations in C++?

The std::atomic types allow atomic access to variables, but I would sometimes like non-atomic access, for example when the access is protected by a mutex. Consider a bitfield class that allows both multi-threaded access (via insert) and single-threaded vectorized access (via operator|=):

class Bitfield
{
    const size_t size_, word_count_;
    std::atomic<size_t> * words_;
    std::mutex mutex_;

public:

    Bitfield (size_t size) :
        size_(size),
        word_count_((size + 8 * sizeof(size_t) - 1) / (8 * sizeof(size_t)))
    {
        // make sure words are 32-byte aligned
        posix_memalign(&words_, 32, word_count_ * sizeof(size_t));
        for (int i = 0; i < word_count_; ++i) {
            new(words_ + i) std::atomic<size_t>(0);
        }
    }
    ~Bitfield () { free(words_); }

private:
    void insert_one (size_t pos)
    {
        size_t mask = size_t(1) << (pos % (8 * sizeof(size_t)));
        std::atomic<size_t> * word = words_ + pos / (8 * sizeof(size_t));
        word->fetch_or(mask, std::memory_order_relaxed);
    }
public:
    void insert (const std::set<size_t> & items)
    {
        std::lock_guard<std::mutex> lock(mutex_);
        // do some sort of muti-threaded insert, with TBB or #pragma omp
        parallel_foreach(items.begin(), items.end(), insert_one);
    }

    void operator |= (const Bitfield & other)
    {
        assert(other.size_ == size_);
        std::unique_lock<std::mutex> lock1(mutex_, defer_lock);
        std::unique_lock<std::mutex> lock2(other.mutex_, defer_lock);
        std::lock(lock1, lock2); // edited to lock other_.mutex_ as well
        // allow gcc to autovectorize (256 bits at once with AVX)
        static_assert(sizeof(size_t) == sizeof(std::atomic<size_t>), "fail");
        size_t * __restrict__ words = reinterpret_cast<size_t *>(words_);
        const size_t * __restrict__ other_words
            = reinterpret_cast<const size_t *>(other.words_);
        for (size_t i = 0, end = word_count_; i < end; ++i) {
            words[i] |= other_words[i];
        }
    }
};

Note operator|= is very close to what's in my real code, but insert(std::set) is just attempting to capture the idea that one can

acquire lock;
make many atomic accesses in parallel;
release lock;

My question is this: what is the best way to mix such atomic and non-atomic access? Answers to [1,2] below suggest that casting is wrong (and I agree). But surely the standard allows such apparently safe access?

More generally, can one use a reader-writer-lock and allow "readers" to read and write atomically, and the unique "writer" to read and write non-atomically?

References

  1. How to use std::atomic efficiently
  2. Accessing atomic<int> of C++0x as non-atomic
like image 758
fritzo Avatar asked Sep 02 '12 16:09

fritzo


People also ask

Is i ++ an atomic C?

It's not atomic in C. If it happens to compile to an atomic operation on some platform, it's not because the C standard made it so, it's because the implementation did. @ Steve: No, in C, increment is not atomic.

Can atomic operations be reordered?

The problem is that atomic operations on their own don't prevent reordering. We need an additional concept for atomics to do this. In C11, atomic operations take in another parameter called "memory ordering" which helps solve this problem.

How do you ensure atomic operations?

There are two ways to accomplish an atomic operation: Use a single machine instruction. Tell the operating system in advance not to interrupt the operation.

Can atomic operations be interrupted by the CPU?

If an operation requires multiple CPU instructions, then it may be interrupted in the middle of executing. If this results in a context switch (or if the interrupt handler refers to data that was being used) then atomicity could be compromised.


1 Answers

Standard C++ prior to C++11 had no multithreaded memory model. I see no changes in the standard that would define the memory model for non-atomic accesses, so those get similar guarantees as in a pre-C++11 environment.

It is actually theoretically even worse than using memory_order_relaxed, because the cross thread behavior of non-atomic accesses is simply completely undefined as opposed to multiple possible orders of execution one of which must eventually happen.

So, to implement such patterns while mixing atomic and non-atomic accesses, you will still have to rely on platform specific non-standard constructs (for example, _ReadBarrier) and/or intimate knowledge of particular hardware.

A better alternative is to get familiar with the memory_order enum and hope to achieve optimum assembly output with a given piece of code and compiler. The end result may be correct, portable, and contain no unwanted memory fences, but you should expect to disassemble and analyze several buggy versions first, if you are like me; and there will still be no guarantee that the use of atomic accesses on all code paths will not result in some superfluous fences on a different architecture or a different compiler.

So the best practical answer is simplicity first. Design your cross-thread interactions as simple as you can make it without completely killing scalability, responsiveness or any other holy cow; have nearly no shared mutable data structures; and access them as rarely as you can, always atomically.

like image 159
Jirka Hanika Avatar answered Oct 29 '22 00:10

Jirka Hanika