I'm writing a program that computes an endgame tablebase for a chess variant. The algorithm for filling the tablebase works like this:
unsigned char
, each member representing one position (we always assume that it's white's turn). An array member is even if the position is lost, odd if it is won, 0xff
if it invalid, 0xfe
if it is a draw.0xff
, each position where white is mated with 0x00
and all other positions with 0x0fe
.0xfe
. Check if there is a move that leads to a position whose array member is even, if there is one, write one plus the number of that position into the member corresponding to the curren position. If all moves lead to positions indicated by odd numbers (i.e. this is a loosing position), mark this position as one plus the highest of these numbers to indicate how long the strongest defence takes.For speed, I'd like to parallelize step three. Careful reading yields that in each iteration, we only ever write one value (the number of the iteration) into the array. The following strategy obtains:
0xfe
because that means that the member was just written concurrently by another thread.Now obviously there is a race condition in this program but it doesn't matter as we always get the right result if there aren't any pink elephants (which can't exist if a char
is written atomically). However, since on paper there is a race condition, the C compiler may declare my program to be undefined and format my hard disk.
What can I do to parallelize this algorithm without violating any constraints of the C memory model and without causing massive slowdown (e.g. by adding locks)?
Here is a simplified algorithm that demonstrates the same concept but is stripped of all the unimportant stuff:
unsigned char a[n]
. Each array member is either 0 or 1.Since we only ever change a 0 to a 2, it doesn't matter in which order we process the array entries, even though there is technically a race condition if we do so in parallel (since we concurrently read/write the same object). How can I tell the compiler that it shouldn't care about the race condition without sacrificing performance?
This is what the _Atomic
type qualifier is for in C11. You would declare your array as
_Atomic unsigned char a[n];
which means that each element of the array can be read or written atomically.
Prior to C11, there's no standard way to do this, but generally, depending on the implementation, certain datatypes will be atomic for reads and writes. To know which those are, you'll have to look at the documentation for the implementation you are using.
Note that the default memory ordering for C11 _Atomic
accesses is memory_order_seq_cst
(sequential consistency), and if you don't need that, you can use atomic_load_explicit
and atomic_store_explicit
actions with a weaker memory ordering (ie memory_order_relaxed
in your example)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With