So we're using a version of boost which is pretty old for now, and until upgrading I need to have an atomic CAS operation in C++ for my code. (we're not using C++0x yet either)
I created the following cas function:
inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
uint32_t prev = cmp;
// This version by Mans Rullgard of Pathscale
__asm__ __volatile__ ( "lock\n\t"
"cmpxchg %2,%0"
: "+m"(*mem), "+a"(prev)
: "r"(with)
: "cc");
return prev;
}
My code which uses the function is somewhat as following:
void myFunc(uint32_t &masterDeserialize )
{
std::ostringstream debugStream;
unsigned int tid = pthread_self();
debugStream << "myFunc, threadId: " << tid << " masterDeserialize= " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;
// memory fence
__asm__ __volatile__ ("" ::: "memory");
uint32_t retMaster = CAS(&masterDeserialize, 1, 0);
debugStream << "After cas, threadid = " << tid << " retMaster = " << retMaster << " MasterDeserialize = " << masterDeserialize << " masterAddress = " << &masterDeserialize << std::endl;
if(retMaster != 0) // not master deserializer.
{
debugStream << "getConfigurationRowField, threadId: " << tid << " NOT master. retMaster = " << retMaster << std::endl;
DO SOMETHING...
}
else
{
debugStream << "getConfigurationRowField, threadId: " << tid << " MASTER. retMaster = " << retMaster << std::endl;
DO SOME LOGIC
// Signal we're done deserializing.
masterDeserialize = 0;
}
std::cout << debugStream.str();
}
My test of this code spawns 10 threads, and signals all of them to call the function with the same masterDeserialize variable.
This works well most of the time, but once every couple of thousand - couple of million test iterations 2 threads can both enter the path of acquiring the MASTER lock.
I'm not sure how this is possible, or how to avoid it.
I tried to use a memory fence before the resetting of the masterDeserialize, thinking that the cpu OOO can have affect, but this has no affect on the result.
Obviously this runs on a machine with many cores, and it is compiled in debug mode, so GCC should not reorder execution for optimizations.
Any suggestions as to what is wrong with the above?
EDIT: I tried using gcc primitive instead of assembly code, got the same result.
inline uint32_t CAS(volatile uint32_t *mem, uint32_t with, uint32_t cmp)
{
return __sync_val_compare_and_swap(mem, cmp, with);
}
I am running on a multi core, multi cpu machine, but it is a Virtual machine, is it possible that this behavior is caused somehow by the VM?
Basically, compare and swap compares an expected value to the concrete value of a variable, and if the concrete value of the variable is equals to the expected value, swaps the value of the variable for a new variable.
The compare-and-swap (CAS) instruction is an uninterruptible instruction that reads a memory location, compares the read value with an expected value, and stores a new value in the memory location when the read value matches the expected value. Otherwise, nothing is done.
In comparison to Test and Set Instruction, the compare and swap instruction operates on three operands. Here the operand value is assigned to a new value only if the expression (*value == expected) is true.
compareExchange() The static Atomics. compareExchange() method exchanges a given replacement value at a given position in the array, if a given expected value equals the old value. It returns the old value at that position whether it was equal to the expected value or not.
Not only two but any number of threads can in theory become "masters" in this code. The problem is that the thread that took the master path after completion sets the masterDeserialize
variable back to 0, thus making it possible to "acquire" again by a thread that might arrive very late to CAS (e.g. due to preemption).
The fix is actually simple - add the third state (with e.g. the value of 2) to this flag to mean "master has completed", and use this state (instead of the initial state of 0) at the end of the master's path to signal its job is done. Thus, only one of the threads that call myFunc
can ever see 0, which gives you the guarantee you need. To reuse the flag, you'd need to explicitly reinitialize it to 0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With