Is there any issue with having a race condition in your code when the operation is writing a single constant value? For example if there is a parallel loop that populated a seen array for every value that is in another array arr (assuming no issues with out of bounds indices). the critical section could be the below code:
//parallel body with index i
int val = arr[i];
seen[val] = true;
Since the only value being written is true does that make the need for a mutex not necessary, and possibly detrimental to performance? Even if threads stomp on each other they would just be filling in the address with the same value, correct?
The C++ memory model does not give you a free pass for writing the same value.
If two threads are writing to a non-atomic object without synchronization, that is simply a race condition. And a race condition means your program executes undefined behavior. And undefined behaviour occuring anywhere in your program's execution means that the behavior of your program, both before and after the point of undefined behavior, is not restricted by the C++ standard in any way.
A given compiler is free to provide a more free memory model. I'm unaware of any that do.
One thing you must understand is that C++ is not an assembler macro language. It doesn't have to produce the naive assembler you imagine in your head. C++ instead tries to make it easy for your compiler to produce assembler, which is a very different thing.
Compilers can and do determine "if X happens, we get undefined behavior; so I'll optimize around the fact that X does not happen" when generating code. In this case here, the compiler can prove that program with defined behavior could ever have the same val in two different unsynchrnoized threads.
All of this can happen long before any assembly is generated.
And at the assembly level, some hardware might do funny things with unaligned assignment to multi-byte values. Some hardware could (in theory; I'm unaware of any in practice) raise traps when instructions that claim to be single-thread writes occur in two different cores on the same bytes.
So this is UB in C++. And once you have UB, you have to audit the assembly code produced by your program in everywhere the compiler who touches this can see. If you do LTO, that means in your entire program, at least everywhere that calls or interacts with your code that does UB, to an unclear distance.
Just write defined behavior. And only if this turns out to be a mission critical performance bottleneck should you spend more effort on optimizing it (first faster defined behavior, and only if that fails do you even consider UB).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With