In our product we have an inlined mutex implementation, using a variety of platform and compiler specific methods for the hardware specific parts. One of our "rules" for some over-optimized code that attempts to "cheat", is that if a variable is accessed outside of the mutex and within, then that variable must be declared volatile. I figured this applied to opaque mutex implementations too (such as pthread_mutex_lock/unlock), and this led to an interesting debate.
It's been asserted by one person that this is an indication of a compiler bug (especially when the mutex implementation is inlined and "not opaque" to the compiler). I gave the following example to dispute this
int v = pSharedMem->myVariable ;
__asm__ __volatile__(( "isync" : : :"memory" ))
v = pSharedMem->myVariable ;
In this LinuxPPC gcc code fragment, the compiler doesn't have any knowledge of the run time effects of the isync, other than what we can tell it via the memory constraint. You'd find such an isync instruction at the tail end of a mutex acquision to prevent any execution of the instructions that follow the successful acquire of the mutex before the mutex was actually held (so if a load had been executed before the isync it would have to be discarded).
In this code fragment, we have the compiler barrier that prevents a rewrite of the code as if it were the following
int v = pSharedMem->myVariable ;
v = pSharedMem->myVariable ;
__asm__ __volatile__(( "isync" : : :"memory" ))
or
__asm__ __volatile__(( "isync" : : :"memory" ))
int v = pSharedMem->myVariable ;
v = pSharedMem->myVariable ;
(ie: both of these compiler re-orderings should be inhibited by the volatile attribute)
We also have the isync itself that prevents the first reordering at run time (but I don't think prevents the second which isn't as interesting).
However, my question is that if myVariable is not declared volatile, whether or not the "memory" constraint is sufficient that gcc would necessarily re-load "v" after the isync? I'd still be inclined to mandate volatile for such a pattern since this sort of code is too touchy with all the platform specific compiler builtins. That said, if we reduce the discussion to just GCC and this code fragment, is this asm memory constraint enough to have code that is generated with a pair of loads instead of just one?
The __asm__ __volatile__
with "memory"
clobber is required to and will act as full reordering barrier. volatile
on the variable is unnecessary. In fact if you look at Linux kernel definition of atomic_t, it does not use any volatile
modifiers and relies completely on the __asm__ __volatile__
statements with appropriate constraits.
On the other hand, I believe volatile
on it's own does not in fact prohibit reordering at all, only caching and optimizing the value away altogether, so it's worthless for synchronization purposes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With