The std::atomic types allow atomic access to variables, but I would sometimes like non-atomic access, for example when the access is protected by a mutex. Consider a bitfield class that allows both multi-threaded access (via insert) and single-threaded vectorized access (via operator|=): <pre class="prettyprint"><code>class Bitfield { const size_t size_, word_count_; std::atomic<size_t> * words_; std::mutex mutex_; public: Bitfield (size_t size) : size_(size), word_count_((size + 8 * sizeof(size_t) - 1) / (8 * sizeof(size_t))) { // make sure words are 32-byte aligned posix_memalign(&words_, 32, word_count_ * sizeof(size_t)); for (int i = 0; i < word_count_; ++i) { new(words_ + i) std::atomic<size_t>(0); } } ~Bitfield () { free(words_); } private: void insert_one (size_t pos) { size_t mask = size_t(1) << (pos % (8 * sizeof(size_t))); std::atomic<size_t> * word = words_ + pos / (8 * sizeof(size_t)); word->fetch_or(mask, std::memory_order_relaxed); } public: void insert (const std::set<size_t> & items) { std::lock_guard<std::mutex> lock(mutex_); // do some sort of muti-threaded insert, with TBB or #pragma omp parallel_foreach(items.begin(), items.end(), insert_one); } void operator |= (const Bitfield & other) { assert(other.size_ == size_); std::unique_lock<std::mutex> lock1(mutex_, defer_lock); std::unique_lock<std::mutex> lock2(other.mutex_, defer_lock); std::lock(lock1, lock2); // edited to lock other_.mutex_ as well // allow gcc to autovectorize (256 bits at once with AVX) static_assert(sizeof(size_t) == sizeof(std::atomic<size_t>), "fail"); size_t * __restrict__ words = reinterpret_cast<size_t *>(words_); const size_t * __restrict__ other_words = reinterpret_cast<const size_t *>(other.words_); for (size_t i = 0, end = word_count_; i < end; ++i) { words[i] |= other_words[i]; } } }; </code></pre> Note operator|= is very close to what's in my real code, but insert(std::set) is just attempting to capture the idea that one can <pre class="prettyprint"><code>acquire lock; make many atomic accesses in parallel; release lock; </code></pre> My question is this: what is the best way to mix such atomic and non-atomic access? Answers to [1,2] below suggest that casting is wrong (and I agree). But surely the standard allows such apparently safe access? More generally, can one use a reader-writer-lock and allow "readers" to read and write atomically, and the unique "writer" to read and write non-atomically? <h3>References</h3> <ol> <li>How to use std::atomic efficiently</li> <li>Accessing atomic<int> of C++0x as non-atomic</li> </ol>

Standard C++ prior to C++11 had no multithreaded memory model. I see no changes in the standard that would define the memory model for non-atomic accesses, so those get similar guarantees as in a pre-C++11 environment. It is actually theoretically even worse than using <code>memory_order_relaxed</code>, because the cross thread behavior of non-atomic accesses is simply completely undefined as opposed to multiple possible orders of execution one of which must eventually happen. So, to implement such patterns while mixing atomic and non-atomic accesses, you will still have to rely on platform specific non-standard constructs (for example, <code>_ReadBarrier</code>) and/or intimate knowledge of particular hardware. A better alternative is to get familiar with the <code>memory_order</code> enum and hope to achieve optimum assembly output with a given piece of code and compiler. The end result may be correct, portable, and contain no unwanted memory fences, but you should expect to disassemble and analyze several buggy versions first, if you are like me; and there will still be no guarantee that the use of atomic accesses on all code paths will not result in some superfluous fences on a different architecture or a different compiler. So the best practical answer is simplicity first. Design your cross-thread interactions as simple as you can make it without completely killing scalability, responsiveness or any other holy cow; have nearly no shared mutable data structures; and access them as rarely as you can, always atomically.

How to mix atomic and non-atomic operations in C++?

Tags:

c++

vectorization

multithreading

c++11

atomic

The std::atomic types allow atomic access to variables, but I would sometimes like non-atomic access, for example when the access is protected by a mutex. Consider a bitfield class that allows both multi-threaded access (via insert) and single-threaded vectorized access (via operator|=):

class Bitfield
{
    const size_t size_, word_count_;
    std::atomic<size_t> * words_;
    std::mutex mutex_;

public:

    Bitfield (size_t size) :
        size_(size),
        word_count_((size + 8 * sizeof(size_t) - 1) / (8 * sizeof(size_t)))
    {
        // make sure words are 32-byte aligned
        posix_memalign(&words_, 32, word_count_ * sizeof(size_t));
        for (int i = 0; i < word_count_; ++i) {
            new(words_ + i) std::atomic<size_t>(0);
        }
    }
    ~Bitfield () { free(words_); }

private:
    void insert_one (size_t pos)
    {
        size_t mask = size_t(1) << (pos % (8 * sizeof(size_t)));
        std::atomic<size_t> * word = words_ + pos / (8 * sizeof(size_t));
        word->fetch_or(mask, std::memory_order_relaxed);
    }
public:
    void insert (const std::set<size_t> & items)
    {
        std::lock_guard<std::mutex> lock(mutex_);
        // do some sort of muti-threaded insert, with TBB or #pragma omp
        parallel_foreach(items.begin(), items.end(), insert_one);
    }

    void operator |= (const Bitfield & other)
    {
        assert(other.size_ == size_);
        std::unique_lock<std::mutex> lock1(mutex_, defer_lock);
        std::unique_lock<std::mutex> lock2(other.mutex_, defer_lock);
        std::lock(lock1, lock2); // edited to lock other_.mutex_ as well
        // allow gcc to autovectorize (256 bits at once with AVX)
        static_assert(sizeof(size_t) == sizeof(std::atomic<size_t>), "fail");
        size_t * __restrict__ words = reinterpret_cast<size_t *>(words_);
        const size_t * __restrict__ other_words
            = reinterpret_cast<const size_t *>(other.words_);
        for (size_t i = 0, end = word_count_; i < end; ++i) {
            words[i] |= other_words[i];
        }
    }
};

Note operator|= is very close to what's in my real code, but insert(std::set) is just attempting to capture the idea that one can

acquire lock;
make many atomic accesses in parallel;
release lock;

My question is this: what is the best way to mix such atomic and non-atomic access? Answers to [1,2] below suggest that casting is wrong (and I agree). But surely the standard allows such apparently safe access?

More generally, can one use a reader-writer-lock and allow "readers" to read and write atomically, and the unique "writer" to read and write non-atomically?

References

How to use std::atomic efficiently
Accessing atomic<int> of C++0x as non-atomic

758

asked Sep 02 '12 16:09

fritzo

1 Answers

Standard C++ prior to C++11 had no multithreaded memory model. I see no changes in the standard that would define the memory model for non-atomic accesses, so those get similar guarantees as in a pre-C++11 environment.

It is actually theoretically even worse than using memory_order_relaxed, because the cross thread behavior of non-atomic accesses is simply completely undefined as opposed to multiple possible orders of execution one of which must eventually happen.

So, to implement such patterns while mixing atomic and non-atomic accesses, you will still have to rely on platform specific non-standard constructs (for example, _ReadBarrier) and/or intimate knowledge of particular hardware.

A better alternative is to get familiar with the memory_order enum and hope to achieve optimum assembly output with a given piece of code and compiler. The end result may be correct, portable, and contain no unwanted memory fences, but you should expect to disassemble and analyze several buggy versions first, if you are like me; and there will still be no guarantee that the use of atomic accesses on all code paths will not result in some superfluous fences on a different architecture or a different compiler.

So the best practical answer is simplicity first. Design your cross-thread interactions as simple as you can make it without completely killing scalability, responsiveness or any other holy cow; have nearly no shared mutable data structures; and access them as rarely as you can, always atomically.

159

answered Oct 29 '22 00:10

Jirka Hanika

Related questions
                            
                                Define rectangle as two points or origin / size?
                            
                                how to convert from cv::Mat to CvArr?
                            
                                How to compile SIMD code with gcc
                            
                                How to use tr1 with Visual Studio 2010 (tr1::function)?
                            
                                How to use the OpenCV 2.4 static libraries with Visual Studio?
                            
                                C++ regex not understanding
                            
                                Printing derived class name in base class
                            
                                Inputting elements of unknown type into a vector
                            
                                C++11 type inference with lambda and std::function
                            
                                how to add proxy support to boost::asio?
                            
                                Any way to find size of memory allocated to map?
                            
                                Type of unsigned long is different from uint32_t and uint64_t on Windows (VS2010)
                            
                                Using GTK without DISPLAY
                            
                                How to avoid singletons when making an OOP state machine design?
                            
                                How to pass a C++ object to another C++ object with Boost.Python
                            
                                'glCreateShader' was not declared in this scope?
                            
                                Conversion to std::array<unsigned char, 1ul>::value_type from int may alter its value
                            
                                SBRM/RAII for std::va_list/va_start()/va_end use
                            
                                What is the correct way to check equality between instances of a union?
                            
                                Why does this C++ code compile when using clang -std=gnu++11?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With