Fences in C++0x, guarantees just on atomics or memory in general

Tags:

The C++0x draft has a notion of fences which seems very distinct from a CPU/chip level notion of fences, or say what the linux kernel guys expect of fences. The question is whether the draft really implies an extremely restricted model, or the wording is just poor and it actually implies true fences.

For example, under 29.8 Fences it states things like:

A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modiﬁes M, Y is sequenced before B, and Y reads the value written by X or a value written by any side eﬀect in the hypothetical release sequence X would head if it were a release operation.

It uses these terms atomic operations and atomic object. There are such atomic operations and methods defined in the draft, but does it mean only those? A release fence sounds like a store fence. A store fence that doesn't guarantee the write of all data prior to the fence is nearly useless. Similar for a load (acquire) fence and full fence.

So, are the fences/barries in the C++0x proper fences and the wording just incredibly poor, or are they exremely restricted/useless as described?

In terms of C++, say I have this existing code (assuming fences are available as high level constructs right now -- instead of say using __sync_synchronize in GCC):

Thread A:
b = 9;
store_fence();
a = 5;

Thread B:
if( a == 5 )
{
  load_fence();
  c = b;
}

Assume a,b,c are of a size to have atomic copy on the platform. The above means that c will only ever be assigned 9. Note we don't care when Thread B sees a==5, just that when it does it also sees b==9.

What is the code in C++0x that guarantees the same relationship?

ANSWER: If you read my chosen answer and all the comments you'll get the gist of the situation. C++0x appears to force you to use an atomic with fences whereas a normal hardware fence does not have this requirement. In many cases this can still be used to replace concurrent algorithms so long as sizeof(atomic<T>) == sizeof(T) and atomic<T>.is_lock_free() == true.

It is unfortunate however that is_lock_free is not a constexpr. That would allow it to be used in a static_assert. Having atomic<T> degenerate to using locks is generally a bad idea: atomic algorithms that use mutexes will have horrible contention problems compared to a mutex-designed algorithm.

775

asked Apr 05 '11 04:04

edA-qa mort-ora-y

1 Answers

Fences provide ordering on all data. However, in order to guarantee that the fence operation from one thread is visible to a second, you need to use atomic operations for the flag, otherwise you have a data race.

std::atomic<bool> ready(false);
int data=0;

void thread_1()
{
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
}

void thread_2()
{
    if(ready.load(std::memory_order_relaxed))
    {
        std::atomic_thread_fence(std::memory_order_acquire);
        std::cout<<"data="<<data<<std::endl;
    }
}

If thread_2 reads ready to be true, then the fences ensure that data can safely be read, and the output will be data=42. If ready is read to be false, then you cannot guarantee that thread_1 has issued the appropriate fence, so a fence in thread 2 would still not provide the necessary ordering guarantees --- if the if in thread_2 was omitted, the access to data would be a data race and undefined behaviour, even with the fence.

Clarification: A std::atomic_thread_fence(std::memory_order_release) is generally equivalent to a store fence, and will likely be implemented as such. However, a single fence on one processor does not guarantee any memory ordering: you need a corresponding fence on a second processor, AND you need to know that when the acquire fence was executed the effects of the release fence were visible to that second processor. It is obvious that if CPU A issues an acquire fence, and then 5 seconds later CPU B issues a release fence, then that release fence cannot synchronize with the acquire fence. Unless you have some means of checking whether or not the fence has been issued on the other CPU, the code on CPU A cannot tell whether it issued its fence before or after the fence on CPU B.

The requirement that you use an atomic operation to check whether or not the fence has been seen is a consequence of the data race rules: you cannot access a non-atomic variable from multiple threads without an ordering relationship, so you cannot use a non-atomic variable to check for an ordering relationship.

A stronger mechanism such as a mutex can of course be used, but that would render the separate fence pointless, as the mutex would provide the fence.

Relaxed atomic operations are likely just plain loads and stores on modern CPUs, though possibly with additional alignment requirements to ensure atomicity.

Code written to use processor-specific fences can readily be changed to use C++0x fences, provided the operations used to check synchronization (rather than those used to access the synchronized data) are atomic. Existing code may well rely on the atomicity of plain loads and stores on a given CPU, but conversion to C++0x will require using atomic operations for those checks in order to provide the ordering guarantees.

answered Oct 29 '22 14:10

Anthony Williams

Related questions
                            
                                Which is better BOOST_MPL_ASSERT or BOOST_STATIC_ASSERT?
                            
                                Always-in-front dialogs
                            
                                How to log stuff in console in Visual Studio C++
                            
                                C++ atomic operations for lock-free structures
                            
                                What is the fastest Dijkstra implementation you know (in C++)?
                            
                                Making a HANDLE RAII-compliant using shared_ptr with a custom deleter
                            
                                Static Runtime Library Linking for Visual C++ Express 2008
                            
                                How to pass a string literal to a function which takes const std::wstring&
                            
                                Does anyone have a FileSystemWatcher-like class in C++/WinAPI?
                            
                                C++ pointer multi-inheritance fun
                            
                                How can I link an .o file using g++
                            
                                Difference in behaviour (GCC and Visual C++)
                            
                                QApplication In Non-Main Thread
                            
                                Exceptional C++[Bug]?
                            
                                Linking against boost barfs with 'undefined reference to `boost::system::get_system_category()'
                            
                                C++ is Virtual destructor still needed if there are no data members in derived?
                            
                                C/C++ getting struct size
                            
                                C++: Designing a component-based entity system - advanced problems
                            
                                Boost python linking
                            
                                Get CMake to execute a target in project before building a library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fences in C++0x, guarantees just on atomics or memory in general

Tags:

c++

multithreading

c++11

memory-model

memory-barriers

edA-qa mort-ora-y

People also ask

1 Answers

Anthony Williams

Recent Activity

Donate For Us