Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining stores/loads of consecutive atomic variables

Tags:

c++

c++11

Referring to a (slightly dated) paper by Hans Boehm, under "Atomic Operations". It mentions that the memory model (proposed at the time) would not prevent an optimizing compiler from combining a sequence of loads, or stores, on the same variable from being combined into a single load. His example is as follows (updated to hopefully correct current syntax):

Given

atomic<int> v;

The code

while( v.load( memory_order_acquire ) ) { ... }

Could be optimized to:

int a = v.load(memory_order_acquire);
while(a) { ... }

Obviously this would be bad, as he states. Now my question is, as the paper is a bit old, does the current C++0x memory model prevent this type of optimization, or is it still technically allowed?

My reading of the standard would seem to lean towards it being disallowed, but the use "acquire" semantics makes it less clear. For example if it were "seq_cst" the model seems to guarantee that the load must partake in a total ordering on the access and loading the value only once would thus seem to violate ordering (as it breaks the sequence happens before relationship).

For acquire I interpret 29.3.2 to mean that this optimization can not occur, since any "release" operation must be observed by the "acquire" operation. Doing only one acquire would seem not valid.

So my question is whether the current model (in the pending standard) would disallow this type of optimization? And if yes, then which part specifically forbids it? If no, does using a volatile atomic solve the problem?

And for bonus, if the load operation has a "relaxed" ordering is the optimization then allowed?

like image 347
edA-qa mort-ora-y Avatar asked Jul 13 '11 09:07

edA-qa mort-ora-y


1 Answers

The C++0x standard attempts to outlaw this optimization.

The relevant words are from 29.3p13:

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

If the thread that is doing the load only ever issues one load instruction then this is violated, as if it misses the write the first time, it will never see it. It doesn't matter which memory ordering is used for the load, it is the same for both memory_order_seq_cst and memory_order_relaxed.

However, the following optimization is allowed, unless there is something in the loop that forces an ordering:

while( v.load( memory_order_acquire ) ) {
    for(unsigned __temp=0;__temp<100;++__temp) {
        // original loop body goes here
    }
}

i.e. the compiler can generate code that executes the actual loads arbitrarily infrequently, provided it still executes them. This is even permitted for memory_order_seq_cst unless there are other memory_order_seq_cst operations in the loop, since this is equivalent to running 100 iterations between any memory accesses by other threads.

As an aside, the use of memory_order_acquire doesn't have the effect you describe --- it is not required to see release operations (other than by 29.3p13 quoted above), just that if it does see the release operation then it imposes visibility constraints on other accesses.

like image 150
Anthony Williams Avatar answered Sep 17 '22 12:09

Anthony Williams