Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronizing against relaxed atomics

I have an allocator that does relaxed atomics to track the number of bytes currently allocated. They're just adds and subtracts so I don't need any synchronization between threads other than ensuring the modifications are atomic.

However, I occasionally want to check the number of allocated bytes (e.g. when shutting down the program) and I want to ensure any pending writes are committed. I assume I need a full memory barrier in this case to prevent any previous writes from being moved after the barrier and to prevent the next read from being moved before the barrier.

The question is: what is the proper way to ensure the relaxed atomic writes are committed before reading? Is my current code correct? (Assume functions and types map to std library constructs as expected.)

void* Allocator::Alloc(size_t bytes, size_t alignment)
{
    void* p = AlignedAlloc(bytes, alignment);
    AtomicFetchAdd(&allocatedBytes, AlignedMsize(p), MemoryOrder::Relaxed);
    return p;
}

void Allocator::Free(void* p)
{
    AtomicFetchSub(&allocatedBytes, AlignedMsize(p), MemoryOrder::Relaxed);
    AlignedFree(p);
}

size_t Allocator::GetAllocatedBytes()
{
    AtomicThreadFence(MemoryOrder::AcqRel);
    return AtomicLoad(&allocatedBytes, MemoryOrder::Relaxed);
}

And some type definitions for context

enum struct MemoryOrder
{
    Relaxed = 0,
    Consume = 1,
    Acquire = 2,
    Release = 3,
    AcqRel = 4,
    SeqCst = 5,
};

struct Allocator
{
    void*  Alloc            (size_t bytes, size_t alignment);
    void   Free             (void* p);
    size_t GetAllocatedBytes();

    Atomic<size_t> allocatedBytes = { 0 };
};

I don't want to simply default to sequential consistency as I'm trying to understand memory ordering better.

The part that's really tripping me up is that in the standard under [atomics.fences] all the points talk about an acquire fence/atomic op synchronizing with a release fence/atomic op. It's entirely opaque to me whether an acquire fence/atomic op will synchronize with a relaxed atomic op on another thread. If an AcqRel fence function literally maps to an mfence instruction, it seems that the above code will be fine. However, I'm having a hard time convincing myself the standard guarantees this. Namely,

4 An atomic operation A that is a release operation on an atomic object M synchronizes with an acquire fence B if there exists some atomic operation X on M such that X is sequenced before B and reads the value written by A or a value written by any side effect in the release sequence headed by A.

This seems to make it clear that the fence will not synchronize with the relaxed atomic writes. On the other hand, a full fence is both a release and an acquire fence, so it should synchronize with itself, right?

2 A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

The scenario described is

  • Unsequenced writes
  • A release fence
  • X atomic write
  • Y atomic read
  • B acquire fence
  • Unsequenced reads (unsequenced writes will be visible here)

However, in my case I don't have the atomic write + atomic read as a signal between the threads and the release fence happens with the acquire fence on thread B. So what's actually happening is

  • Unsequenced writes
  • A release fence
  • B acquire fence
  • Unsequenced reads

Clearly if the fence executes before an unsequenced write begins it's a race and all bets are off. But it seems to me that if the fence executes after an unsequenced write begins but before it is committed it will be forced to finish before the unsequenced reads. This is exactly that I want, but I can't glean whether this is guaranteed by the standard.

like image 415
Adam Avatar asked Sep 13 '18 16:09

Adam


1 Answers

Let's say you spawn Thread A, which calls Allocator::Alloc(), then immediately spawn Thread B, which calls Allocator::GetAllocatedBytes(). Those two Allocator calls are now running concurrently. You don't know which one will actually happen first, because there's no ordering between them. Your only guarantee is that either Thread B will see the value of allocatedBytes before Thread A modifies it, or it will see the value of allocatedBytes after Thread A modifies it. You won't know which value Thread B saw until after GetAllocatedBytes() returns. (At least Thread B won't see a totally garbage value for allocatedBytes, because there's no data race on it thanks to your use of relaxed atomics.)

You seem to be concerned about the case where Thread A got as far as AtomicFetchAdd(), but for some reason, the change is not visible when Thread B calls AtomicLoad(). But so what? That's no different from the outcome where GetAllocatedBytes() runs entirely before AtomicFetchAdd(). And that's a totally valid outcome. Remember, either Thread B sees the modified value, or it doesn't.

Even if you change all the atomic operations/fences to MemoryOrder::SeqCst, it won't make any difference. In the scenario I described, Thread B can still either see the modified value or the unmodified value of allocatedBytes, because the two Allocator calls run concurrently.

As long as you insist on calling GetAllocatedBytes() while other threads are still calling Alloc() and Free(), that's really the most you can expect. If you want to get a more "accurate" value, just don't allow any concurrent calls to Alloc()/Free() while GetAllocatedBytes() is running! For example, if the program is shutting down, just join all the other threads before calling GetAllocatedBytes(). That'll give you an accurate number of allocated bytes at shutdown. The C++ standard even guarantees it, because the completion of a thread synchronizes with the call to join().

like image 97
preshing Avatar answered Nov 10 '22 08:11

preshing