I want to allocate memory in one thread, and safely "lend" the pointer to another thread so it can read that memory.
I'm using a high level language that translates to C. The high level language has threads (of unspecified threading API, since it's cross-platform -- see below) and supports standard C multi-threading primitives, like atomic-compare-exchange, but it's not really documented (no usage examples). The constraints of this high-level language are:
Now this is impractical for large (don't want the copy) or variable-sized (I think array-size is part of the type) messages. I want to send such messages, and here's the outline of how I want to achieve it:
I need to know how to ensure that this works without data races. My understanding is that I need to use memory fences, but I'm not entirely sure which one (ATOMIC_RELEASE, ...) and where in the loop (or if I need any at all).
Because my high-level language needs to be cross-platform, I need the answer to work on:
pthread_mutex_init
and pthread_mutex_lock
+ pthread_mutex_unlock
InitializeCriticalSection
, and EnterCriticalSection
+ LeaveCriticalSection
If it helps, I'm assuming the following architectures:
And using the following compilers (you can assume a "recent" version of all of them):
I've only built on Windows so far, but when the app is done, I want to port it to the other platforms with minimal work. Therefore I'm trying to ensure cross-platform compatibility from the start.
Here is my assumed work-flow:
Real code is too large to post. Here is simplified (just enough to show how the shared memory is accessed) pseudocode using a mutex (like the message queues):
static pointer p = null
static mutex m = ...
static thread_A_buffer = malloc(...)
Thread-A:
do:
// Send pointer to data
int index = findFreeIndex(thread_A_buffer)
// Assume different value (not 42) every time
thread_A_buffer[index] = 42
// Call some "memory fence" here (after writing, before sending)?
lock(m)
p = &(thread_A_buffer[index])
signal()
unlock(m)
// wait for processing
// in reality, would wait for a second signal...
pointer p_a = null
do:
// sleep
lock(m)
p_a = p
unlock(m)
while (p_a != null)
// Free data
thread_A_buffer[index] = 0
freeIndex(thread_A_buffer, index)
while true
Thread-B:
while true:
// wait for data
pointer p_b = null
while (p_b == null)
lock(m)
wait()
p_b = p
unlock(m)
// Call some "memory fence" here (after receiving, before reading)?
// process data
print *p_b
// say we are done
lock(m)
p = null
// in reality, would send a second signal...
unlock(m)
Would this solution work? Reformulating the question, does Thread-B print "42"? Always, on all considered platforms and OS (pthreads and Windows CS)? Or do I need to add other threading primitives such as memory fences?
I've spent hours looking at many related SO questions, and read some articles, but I'm still not totally sure. Based on @Art comment, I probably don't need to do anything. I believe this is based on this statement from the POSIX standard, 4.12 Memory Synchronization:
[...] using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads.
My problem is that this sentence doesn't clearly specify if they mean "all the accessed memory", or "only the memory accessed between lock and unlock." I have read people arguing for both cases, and even some implying it was written imprecisely on purpose, to give compiler implementers more leeway in their implementation!
Furthermore, this applies to pthreads, but I need to know how it applies to Windows threading as well.
I'll choose any answer that, based on quotes/links from either a standard documentation, or some other highly reliable source, either proves that I don't need fences or shows which fences I need, under the aforementioned platform configurations, at least for the Windows/Linux/MacOS case. If the Windows threads behave like the pthreads in this case, I'd like a link/quote for that too.
The following are some (of the best) related questions/links I read, but the presence of conflicting information causes me to doubt my understanding.
My review of the documentation of C++11
and similar wording in C11:n1570.pdf lead me to the following understanding.
Data is safely usable between threads, if some form of co-operative synchronization is performed between the threads. If there was a queue, which within a mutex read an item from the queue, and if items were added to the queue whilst the mutex was held, then the memory readable in the second thread would be the memory which had been written to in the first thread.
That is because the compiler and underlying CPU infrastructure are not allowed to organize side effects which pass through the ordering.
From n1570
An evaluation A inter-thread happens before an evaluation B if A synchronizes with B, A is dependency-ordered before B, or, for some evaluation X:
— A synchronizes with X and X is sequenced before B,
— A is sequenced before X and X inter-thread happens before B, or
— A inter-thread happens before X and X inter-thread happens before B
So to ensure the memory visible in the new thread is consistent, then the following would guarantee the results.
The interlocked write, causes all the preceding operations on thread A to be sequenced and be cache-flushed before thread B sees the read.
After the data is written to the queue for "other thread processing", then the first thread can't safely (unlocked) modify or read any of the memory in the object, until it knows (through some mechanism) that the other thread is not accessing the data any more. It will only see the correct results, if this is done through some synchronization mechanism.
Both the C++ and C standards are intended to formalize existing behavior of compilers and CPUs. So although there are less formal guarantees in the use of pthreads, and C99 standards, these would be expected to be consistent.
Thread A
int index = findFreeIndex(thread_A_buffer)
This line is problematic, as it doesn't show any synchronization primitives. If the mechanism for findFreeIndex only relies on memory which is written by thread A, then this will work. If thread B, or any other thread modifies the memory, then a further lock is necessary.
lock(m)
p = &(thread_A_buffer[index])
signal()
unlock(m)
This is covered by ....
15 An evaluation A is dependency-ordered before an evaluation B if
— A performs a release operation on an atomic object M, and, in another thread, B performs a consume operation on M and reads a value written by any side effect in the release sequence headed by A, or
— for some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
and
18 An evaluation A happens before an evaluation B if A is sequenced before B or A interthread happens before B.
The operations before the synchronization "happen before" the synchronization, and are guaranteed to be visible after the synchronization in the other thread.
The lock (an acquire), and unlock (a release), ensure there is a strict ordering on the information in thread A being completed and visible to B.
thread_A_buffer[index] = 42; // happens before
At the moment the memory thread_A_buffer is visible on A, but to read it on B causes undefined behavior.
lock(m); // acquire
Although needed for the release, I can't see any results from the acquire.
p = &thread_A_buffer[index];
unlock(m);
All the instruction stream of A is now visible to B (due to its synchronization with m).
thread_A_buffer[index] = 42; << This happens before and ...
p = &thread_A_buffer[index]; << carries a dependency into p
unlock(m);
All the stuff in A is now visible to B because
An evaluation A inter-thread happens before an evaluation B if A synchronizes with B, A is dependency-ordered before B, or, for some evaluation X
— A synchronizes with X and X is sequenced before B,
— A is sequenced before X and X inter-thread happens before B, or
— A inter-thread happens before X and X inter-thread happens before B.
pointer p_a = null
do:
// sleep
lock(m)
p_a = p
unlock(m)
while (p_a != null)
This code is completely safe, the value read into p_a will be ordered with the other thread, and will be not-null after the syncronized write in thread b. Again, the lock/unlock cause a strict ordering which ensure the read value will be the written value.
All of thread B's interactions are within a lock, so again completely safe.
If A were to modify the object after it had given the object to B, then it would not work, unless there was some further synchronization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With