volatile vs. compiler barrier with gcc inline assembly

Question

In our product we have an inlined mutex implementation, using a variety of platform and compiler specific methods for the hardware specific parts. One of our "rules" for some over-optimized code that attempts to "cheat", is that if a variable is accessed outside of the mutex and within, then that variable must be declared volatile. I figured this applied to opaque mutex implementations too (such as pthread_mutex_lock/unlock), and this led to an interesting debate.

It's been asserted by one person that this is an indication of a compiler bug (especially when the mutex implementation is inlined and "not opaque" to the compiler). I gave the following example to dispute this

int v = pSharedMem->myVariable ;

__asm__ __volatile__(( "isync" : : :"memory" ))

v = pSharedMem->myVariable ;

In this LinuxPPC gcc code fragment, the compiler doesn't have any knowledge of the run time effects of the isync, other than what we can tell it via the memory constraint. You'd find such an isync instruction at the tail end of a mutex acquision to prevent any execution of the instructions that follow the successful acquire of the mutex before the mutex was actually held (so if a load had been executed before the isync it would have to be discarded).

In this code fragment, we have the compiler barrier that prevents a rewrite of the code as if it were the following

int v = pSharedMem->myVariable ;
v = pSharedMem->myVariable ;

__asm__ __volatile__(( "isync" : : :"memory" ))

or

__asm__ __volatile__(( "isync" : : :"memory" ))

int v = pSharedMem->myVariable ;
v = pSharedMem->myVariable ;

(ie: both of these compiler re-orderings should be inhibited by the volatile attribute)

We also have the isync itself that prevents the first reordering at run time (but I don't think prevents the second which isn't as interesting).

However, my question is that if myVariable is not declared volatile, whether or not the "memory" constraint is sufficient that gcc would necessarily re-load "v" after the isync? I'd still be inclined to mandate volatile for such a pattern since this sort of code is too touchy with all the platform specific compiler builtins. That said, if we reduce the discussion to just GCC and this code fragment, is this asm memory constraint enough to have code that is generated with a pair of loads instead of just one?

Jan Hudec · Accepted Answer

The __asm__ __volatile__ with "memory" clobber is required to and will act as full reordering barrier. volatile on the variable is unnecessary. In fact if you look at Linux kernel definition of atomic_t, it does not use any volatile modifiers and relies completely on the __asm__ __volatile__ statements with appropriate constraits.

On the other hand, I believe volatile on it's own does not in fact prohibit reordering at all, only caching and optimizing the value away altogether, so it's worthless for synchronization purposes.

volatile vs. compiler barrier with gcc inline assembly

Tags:

gcc

constraints

inline-assembly

Peeter Joot

1 Answers

Jan Hudec

Recent Activity

Donate For Us

volatile vs. compiler barrier with gcc inline assembly

Tags:

gcc

constraints

inline-assembly

Peeter Joot

1 Answers

Jan Hudec

Related questions

Recent Activity

Donate For Us