The C++0x draft has a notion of fences which seems very distinct from a CPU/chip level notion of fences, or say what the linux kernel guys expect of fences. The question is whether the draft really implies an extremely restricted model, or the wording is just poor and it actually implies true fences.
For example, under 29.8 Fences it states things like:
A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.
It uses these terms atomic operations
and atomic object
. There are such atomic operations and methods defined in the draft, but does it mean only those? A release fence sounds like a store fence. A store fence that doesn't guarantee the write of all data prior to the fence is nearly useless. Similar for a load (acquire) fence and full fence.
So, are the fences/barries in the C++0x proper fences and the wording just incredibly poor, or are they exremely restricted/useless as described?
In terms of C++, say I have this existing code (assuming fences are available as high level constructs right now -- instead of say using __sync_synchronize in GCC):
Thread A:
b = 9;
store_fence();
a = 5;
Thread B:
if( a == 5 )
{
load_fence();
c = b;
}
Assume a,b,c are of a size to have atomic copy on the platform. The above means that c
will only ever be assigned 9
. Note we don't care when Thread B sees a==5
, just that when it does it also sees b==9
.
What is the code in C++0x that guarantees the same relationship?
ANSWER: If you read my chosen answer and all the comments you'll get the gist of the situation. C++0x appears to force you to use an atomic with fences whereas a normal hardware fence does not have this requirement. In many cases this can still be used to replace concurrent algorithms so long as sizeof(atomic<T>) == sizeof(T)
and atomic<T>.is_lock_free() == true
.
It is unfortunate however that is_lock_free
is not a constexpr. That would allow it to be used in a static_assert
. Having atomic<T>
degenerate to using locks is generally a bad idea: atomic algorithms that use mutexes will have horrible contention problems compared to a mutex-designed algorithm.
Memory fence is a type of barrier instruction that causes a CPU or compiler to enforce ordering constraint on memory operations issued before and after the memory fence instruction. This typically means that operations issued prior to the fence are guaranteed to performed before operations issued after the fence.
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
Fences provide ordering on all data. However, in order to guarantee that the fence operation from one thread is visible to a second, you need to use atomic operations for the flag, otherwise you have a data race.
std::atomic<bool> ready(false);
int data=0;
void thread_1()
{
data=42;
std::atomic_thread_fence(std::memory_order_release);
ready.store(true,std::memory_order_relaxed);
}
void thread_2()
{
if(ready.load(std::memory_order_relaxed))
{
std::atomic_thread_fence(std::memory_order_acquire);
std::cout<<"data="<<data<<std::endl;
}
}
If thread_2
reads ready
to be true
, then the fences ensure that data
can safely be read, and the output will be data=42
. If ready
is read to be false
, then you cannot guarantee that thread_1
has issued the appropriate fence, so a fence in thread 2 would still not provide the necessary ordering guarantees --- if the if
in thread_2
was omitted, the access to data
would be a data race and undefined behaviour, even with the fence.
Clarification: A std::atomic_thread_fence(std::memory_order_release)
is generally equivalent to a store fence, and will likely be implemented as such. However, a single fence on one processor does not guarantee any memory ordering: you need a corresponding fence on a second processor, AND you need to know that when the acquire fence was executed the effects of the release fence were visible to that second processor. It is obvious that if CPU A issues an acquire fence, and then 5 seconds later CPU B issues a release fence, then that release fence cannot synchronize with the acquire fence. Unless you have some means of checking whether or not the fence has been issued on the other CPU, the code on CPU A cannot tell whether it issued its fence before or after the fence on CPU B.
The requirement that you use an atomic operation to check whether or not the fence has been seen is a consequence of the data race rules: you cannot access a non-atomic variable from multiple threads without an ordering relationship, so you cannot use a non-atomic variable to check for an ordering relationship.
A stronger mechanism such as a mutex can of course be used, but that would render the separate fence pointless, as the mutex would provide the fence.
Relaxed atomic operations are likely just plain loads and stores on modern CPUs, though possibly with additional alignment requirements to ensure atomicity.
Code written to use processor-specific fences can readily be changed to use C++0x fences, provided the operations used to check synchronization (rather than those used to access the synchronized data) are atomic. Existing code may well rely on the atomicity of plain loads and stores on a given CPU, but conversion to C++0x will require using atomic operations for those checks in order to provide the ordering guarantees.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With