Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

update integer array elements atomically C++

Given a shared array of integer counters, I am interested to know if a thread can atomically fetch and add an array element without locking the entire array?

Here's an illustration of working model that uses mutex to lock access to the entire array.

// thread-shared class members
std::mutex count_array_mutex_;
std::vector<int> counter_array_( 100ish );

// Thread critical section
int counter_index = ... // unpredictable index
int current_count;
{
  std::lock_guard<std::mutex> lock(count_array_mutex_);
  current_count = counter_array_[counter_index] ++;
}
// ... do stuff using current_count.

I'd like multiple threads to be able to fetch-add separate array elements simultaneously.

So far, in my research of std::atomic<int> I'm thrown off that constructing the atomic object also constructs the protected member. (And plenty of answers explaining why you can't make a std::vector<std::atomic<int> > )

like image 471
NoahR Avatar asked Feb 05 '19 15:02

NoahR


Video Answer


2 Answers

C++20 / C++2a (or whatever you want to call it) will add std::atomic_ref<T> which lets you do atomic operations on an object that wasn't atomic<T> to start with.

It's not available yet as part of the standard library for most compilers, but there is a working implementation for gcc/clang/ICC / other compilers with GNU extensions.

Previously, atomic access to "plain" data was only available with some platform-specific functions like Microsoft's LONG InterlockedExchange(LONG volatile *Target, LONG Value); or GNU C / C++
type __atomic_add_fetch (type *ptr, type val, int memorder) (the same builtins that C++ libraries for GNU compilers use to implement std::atomic<T>.)

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0019r8.html includes some intro stuff about the motivation. CPUs can easily do this, compilers can already do this, and it's been annoying that C++ didn't expose this capability portably.

So instead of having to wrestle with C++ to get all the non-atomic allocation and init done in a constructor, you can just have every access create an atomic_ref to the element you want to access. (It's free to instantiate as a local, at least when it's lock-free, on any "normal" C++ implementations).

This will even let you do things like resize the std::vector<int> after you've ensured no other threads are accessing the vector elements or the vector control block itself. And then you can signal the other threads to resume.

It's not yet implemented in libstdc++ or libc++ for gcc/clang.

#include <vector>
#include <atomic>

#define Foo std   // this atomic_ref.hpp puts it in namespace Foo, not std.
// current raw url for https://github.com/ORNL/cpp-proposals-pub/blob/master/P0019/atomic_ref.hpp
#include "https://raw.githubusercontent.com/ORNL/cpp-proposals-pub/580934e3b8cf886e09accedbb25e8be2d83304ae/P0019/atomic_ref.hpp"


void inc_element(std::vector<int> &v, size_t idx)
{
    v[idx]++;
}

void atomic_inc_element(std::vector<int> &v, size_t idx)
{
    std::atomic_ref<int> elem(v[idx]);
    static_assert(decltype(elem)::is_always_lock_free,
           "performance is going to suck without lock-free atomic_ref<T>");

    elem.fetch_add(1, std::memory_order_relaxed);  // take your pick of memory order here
}

For x86-64, these compile exactly the way we'd hope with GCC, using the sample implementation (for compilers implementing GNU extensions) linked in the C++ working-group proposal. https://github.com/ORNL/cpp-proposals-pub/blob/master/P0019/atomic_ref.hpp

From the Godbolt compiler explorer with g++8.2 -Wall -O3 -std=gnu++2a:

inc_element(std::vector<int, std::allocator<int> >&, unsigned long):
    mov       rax, QWORD PTR [rdi]          # load the pointer member of std::vector
    add       DWORD PTR [rax+rsi*4], 1      # and index it as a memory destination
    ret

atomic_inc_element(std::vector<int, std::allocator<int> >&, unsigned long):
    mov       rax, QWORD PTR [rdi]
    lock add  DWORD PTR [rax+rsi*4], 1     # same but atomic RMW
    ret

The atomic version is identical except it uses a lock prefix to make the read-modify-write atomic, by making sure no other core can read or write the cache line while this core is in the middle of atomically modifying it. Just in case you were curious how atomics work in asm.

Most non-x86 ISAs like AArch64 of course require a LL/SC retry loop to implement an atomic RMW, even with relaxed memory order.

The point here is that constructing / destructing the atomic_ref doesn't cost anything. Its member pointer fully optimizes away. So this is exactly as cheap as a vector<atomic<int>>, but without the headache.

As long as you're careful not to create data-race UB by resizing the vector, or accessing an element without going through atomic_ref. (It would potentially manifest as a use-after-free on many real implementations if std::vector reallocated the memory in parallel with another thread indexing into it, and of course you'd be atomically modifying a stale copy.)

This definitely gives you rope to hang yourself if you don't carefully respect the fact that the std::vector object itself is not atomic, and also that the compiler won't stop you from doing non-atomic access to the underlying v[idx] after other threads have started using it.

like image 103
Peter Cordes Avatar answered Dec 01 '22 19:12

Peter Cordes


One way:

// Create.
std::vector<std::atomic<int>> v(100);
// Initialize.
for(auto& e : v)
    e.store(0, std::memory_order_relaxed);

// Atomically increment.
auto unpredictable_index = std::rand() % v.size();
int old = v[unpredictable_index].fetch_add(1, std::memory_order_relaxed);

Note that std::atomic<> copy-constructor is deleted, so that the vector cannot be resized and needs to be initialized with the final count of elements.

Since resize functionality of std::vector is lost, instead of std::vector you may as well use std::unique_ptr<std::atomic<int>[]>, e.g.:

// Create.
unsigned const N = 100;
std::unique_ptr<std::atomic<int>[]> p(new std::atomic<int>[N]);
// Initialize.
for(unsigned i = 0; i < N; ++i)
    p[i].store(0, std::memory_order_relaxed);

// Atomically increment.
auto unpredictable_index = std::rand() % N;
int old = p[unpredictable_index].fetch_add(1, std::memory_order_relaxed);
like image 42
Maxim Egorushkin Avatar answered Dec 01 '22 19:12

Maxim Egorushkin