Below is a simplified C program that demonstrates a problem I am having with a concurrent stack implemented using the GNU built in compare and swap on an intel cpu. It took me a while to understand what was happening, but now that I do I see that it is well within the guarantees provided by atomic compare and swap.
When a node is popped from the stack, modified, then placed back on the stack, the modified value may become the new head of the stack, corrupting it. The comments in test_get describe the order of events that cause this.
Is there any way to reliably use the same node with the same stack more than once? This is an exaggerated example but even returning an unmodified node to gHead could leak other nodes. The original point of this data structure was to repeatedly use the same nodes.
typedef struct test_node {
struct test_node *next;
void *data;
} *test_node_p;
test_node_p gHead = NULL;
unsigned gThreadsDone = 0;
void test_put( test_node_p inMemory ) {
test_node_p scratch;
do {
scratch = gHead;
inMemory->next = scratch;
} while ( !__sync_bool_compare_and_swap( &gHead , scratch , inMemory ) );
}
test_node_p test_get( void ) {
test_node_p result;
test_node_p scratch;
do {
result = gHead;
if ( NULL == result ) break;
// other thread acquires result, modifies next
scratch = result->next; // scratch is 0xDEFACED...
// other thread returns result to gHead
} while ( !__sync_bool_compare_and_swap( &gHead , result , scratch ) );
// this thread corrupts gHead with 0xDEFACED... value
if ( NULL == result ) {
result = (test_node_p)malloc( sizeof(struct test_node) );
}
return result;
}
void *memory_entry( void *in ) {
test_node_p node;
int index , count = 1000;
for ( index = 0 ; index < count ; ++index ) {
node = test_get();
*(uint64_t *)(node) |= 0xDEFACED000000000ULL;
test_put( node );
}
__sync_add_and_fetch(&gThreadsDone,1);
return NULL;
}
void main() {
unsigned index , count = 8;
pthread_t thread;
gThreadsDone = 0;
for ( index = 0 ; index < count ; ++index ) {
pthread_create( &thread , NULL , memory_entry , NULL );
pthread_detach( thread );
}
while ( __sync_add_and_fetch(&gThreadsDone,0) < count ) {}
}
I am running this test with 8 logical cores. When I use only 4 threads the problem rarely occurs but with 8 it is easily reproducible. I have a MacBook with an Intel Core i7.
I am not interested in using some library or framework that has solved this, I want to know how it was solved. Thank you.
Edit:
Here are two solutions that do not work in my case.
Some architectures provide ll/sc instruction pairs that perform atomic tests on the address instead of the value. Any write to the address, even of the same value, prevents success.
Some architectures provide larger than pointer size compare and swap. With this, a monotonic counter is paired with the pointer which is atomically incremented every time it is used so the value always changes. Some intel chips support this but there is no GNU wrapper.
Here is a play by play of the problem. Two threads, A and B, reach the point in test_get
where they have the same value for result
and it is not NULL
. Then the following sequence occurs:
result
from test_get
result
result
, getting whatever thread A put thereresult
and calls test_put
test_put
putting result back on gHead
test_get
and passesgHead
with the value from Thread AHere is a similar scenario with three threads that does not require Thread A to modify result
.
result
from test_get
result
result
, getting a valid value in scratch
test_put
with an unrelated value and succeedstest_put
and succeeds in putting result
back on gHead
test_get
and passesgHead
by leaking whatever thread C addedIn either scenario, the problem is that thread A passes the compare and swap twice, once for get then again for put, before thread B reaches the compare and swap for get. Any value thread B has for scratch should be discarded, but is not because the value in gHead appears correct.
Any solution that makes this less likely without actually preventing it just makes the bug harder to track down.
Note that the scratch variable is just a name for wherever the dereferenced value of result is placed before the atomic instruction begins. Removing the name does remove the time slice between dereference and compare that may be interrupted.
Note also that atomic means it succeeds or fails as a whole. Any aligned read of a pointer is implicitly atomic on the hardware in question. As far as other threads are concerned, there is no interruptible point where only half the pointer has been read.
Never access an atomic variable through a simple evaluation. Also, for a compare and swap loop like yours, __sync_val_compare_and_swap
is more convenient, I think.
/* read the head atomically */
result = __sync_val_compare_and_swap(&gHead, 0, 0);
/* compare and swap until we succeed placing the next field */
while ((oldval = result)) {
result = __sync_val_compare_and_swap(&gHead, result, result->next);
if (oldval == result) break;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With