Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data race with memcpy, undefined behavior?

I write to an area of memory (with memcpy) in one thread, and copy it to a new location with memcpy in another. Sometimes these operations can overlap, resulting in a data race. Programs with data races invoke undefined behavior and are invalid.

In this case I check after the copy that the data copied was valid (that no race actually occurred.) If a race did occur, I discard the copied data. However, AFAIK, that doesn't let me off the hook regarding the UB. I think it's still UB whether or not I use the result of the data race.

Now I could write my own memcpy routine in assembly (or just copy and paste the one from libc), which would side step the whole UB issue. Assembly is not C++ and whatever happens in assembly won't give the compiler license to invoke the nasal demons[1]. Incidentally is that true for inline asm as well as externally compiled and linked asm? Although memcpy is already assembly in any modern libc, it can also be treated specially by the compiler, which often does optimizations like a small inline memcpy for known sizes and alignments - which may invoke the nasal demons again.

Am I overthinking things here? It's hard to imagine a compiler so god-like that it can detect a data race at compile time - and at the same time so stupid that the optimizer uses it to generate bad code instead of reporting it. But compilers lately have a way of pushing both of those limits - so I feel the need to seek advice here on Stack Overflow.

[Edit] Since there is a lot of curiosity about how I'm synchronizing things here let me explain. The pointer to the memory being copied is shared between threads. It's accessed with atomic load(mo_acquire). Then the memory is copied to a new location. Then there's a LoadLoad barrier, followed by a second load(mo_relaxed) of the pointer. If the pointers don't match, the result of the copy is discarded because another thread may have raced with this thread during the copy. The thread that writes to the memory first updates the pointer to null with atomic store(mo_relaxed) followed by a StoreStore barrier and the racing memcpy. So while the two calls to memcpy in different threads could be a data race - in reality that's always detected and the result always discarded in that case. I call this scheme copy-on-read and I use it to allow resurrecting objects in a cache after they've been evicted but before the memory has been re-used without any mutexes or "strong" synchronization involved.

[1]: I long for a more civilized time when compilers report UB instead of abusing it for optimizations that may be contrary to the behavior the programmer expects.

like image 542
Eloff Avatar asked Sep 25 '22 05:09

Eloff


1 Answers

Synchronization locks use a very similar method to what you are doing, although only on very small amounts of memory. A synchronization lock will be faster if the rate of data-race occurence is high, but your approach may actually be faster if the race rate is low.

Although the result of the memcpy is undefined, this is not undefined behavior, as long as you can detect whether a race occurred, and know whether to ignore the garbage results.

This doesn't sound as though you run the risk of a protection violation or similar crash bug; I have not worked with memcpy enough to know if there are any scenarios where it could crash during overlapping operations, but I don't believe it should.

So, as long as the behavior can be detected, this is not necessarily a bad thing, if it meets your needs in a way that is significantly better than standard approaches. I wouldn't recommend using this "just because", but if you need speed that you can't get with traditional locks, and you document the well-defined-but-non-standard behavior very thoroughly in whatever way you normally provide documentation for maintenance, it is acceptable.

As for the compiler optimization comments, I have never seen a compiler rely on undefined behavior to optimize code, and since C++ compilers are required to guarantee specific behavior according to the C++ spec, I would immediately discontinue using any compiler that relies on undefined behavior for that purpose. Library code specifically documents that simultaneous read/write operations across threads are not supported and shouldn't be done, so using library code across threads in this way doesn't qualify as undefined behavior, but rather intentional misuse of the library code at your own risk, and all express or implied warranties are void.

like image 188
Matt Jordan Avatar answered Sep 28 '22 04:09

Matt Jordan