Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safely passing read-only data to a new thread

Suppose I have a program that initializes a global variable for use by threads, like the following:

int ThreadParameter;

// this function runs from the main thread
void SomeFunction() {
    ThreadParameter = 5;

    StartThread(); // some function to start a thread
    // at this point, ThreadParameter is NEVER modified.
}

// this function is run in a background worker thread created by StartThread();
void WorkerThread() {
    PrintValue(ThreadParameter); // we expect this to print "5"
}

These questions should apply for any generic processor architecture that one might encounter. I want the solution to be portable - not specific to an architecture with stronger memory guarantees, like x86.

  1. General question: despite being very common, is this really safe across all processor architectures? How to make it safe, if not?
  2. The global variable isn't volatile; is it possibly going to be reordered after the StartThread() call and leave me hosed? How to fix this problem?
  3. Suppose the computer has two processors that have their own caches. Main thread runs on first processor and worker thread runs on second processor. Suppose the memory block that contains ThreadParameter has been paged into each processor's cache before the program begins to run SomeFunction(). SomeFunction() writes 5 to ThreadParameter, which gets stored in the first processor's cache, and then starts the worker thread, which runs on the second processor. Won't WorkerThread() on the second processor see uninitialized data for ThreadParameter instead of the expected value of 5, since the memory page in the second processor hasn't yet seen the update from the first processor?
  4. If something different is required - how best to handle this given that rather than a simple int, I could be working with a pointer to much more complex data types that aren't necessarily used in a multithreaded environment?

If my concerns are unfounded, what are the specific reasons why I don't need to worry?

like image 401
James Johnston Avatar asked Mar 21 '12 20:03

James Johnston


3 Answers

From you description, it seems you're writing to ThreadParameter (or some other data structure) BEFORE starting any child threads, and you will never write to ThreadParameter again... it exists to be read as needed, but never changed again after its initialization; is that correct? If so, then there's no need whatsoever to employ any thread synchronization system calls (or processor/compiler primitives) every time a child thread wants to read the data, or even the first time for that matter.

The treatment of volatile is somewhat compiler-specific; I know that at least with Diab for PowerPC, there is a compiler option regarding the treatment of volatile: either use the PowerPC EIEIO (or MBAR) instruction after every read/write to a variable, or don't use it... this is in addition to prohibiting compiler optimizations associated with the variable. (EIEIO/MBAR is PowerPC's instruction for prohibiting reordering of I/O by the processor itself; i.e, all I/O from before the instruction must complete before any I/O after the instruction).

From a correctness/safety standpoint, it doesn't hurt to declare it as volatile. But from a pragmatic standpoint, if you initialize ThreadParameter far enough ahead of StartThread(), declaring it volatile shouldn't really be necessary (and not doing so would speed up all subsequent accesses of it). Pretty much any substantial function call (say, perhaps to printf() or cout, or any system call, etc) would issue orders of magnitude more instructions than necessary to ensure there's no way the processor wouldn't have long ago handled the write to ThreadParameter before your call to StartThread(). Realistically, StartThread() itself almost certainly will execute enough instructions before the thread in question actually starts. So I'm suggesting that you don't really need to declare it volatile, probably not even if you initialize it immediately before calling StartThread().

Now as to your question regarding what would happen if the page containing that variable were already loaded into the cache of both processors before the processor running the main thread performs the initialization: If you're using a commonly available general purpose platform with like-kind CPUs, the hardware should already be in place to handle the cache coherency for you. The place you get into trouble with cache coherency on general purpose platforms, whether or not they're multiprocessor, is when your processor has separate instruction & data caches and you write self-modifying code: The instructions written to memory are indistinguishable from data, so the CPU doesn't invalidate those locations in the instruction cache, so there may be stale instructions in the instruction cache unless you subsequently invalidate those locations in the instruction cache (either issuing your own processor-specific assembly instructions, which you might not be allowed to do depending on your OS and your thread's privilege level, or else issuing the appropriate cache-invalidate system call for your OS). But what you're describing isn't self-modifying code, so you should be safe in that regard.

Your question 1 asks how to make this safe across ALL processor architectures. Well, as I discussed above, you should be safe if you're using like-kind processors whose data busses are properly bridged. General-purpose processors designed for multiprocessor interconnection have bus snoop protocols to detect writes to shared memory... as long as your threading library properly configures the shared memory region. If you're working in an embedded system, you may have to configure that yourself in your BSP... for PowerPC, you need to look at the WIMG bits in your MMU/BAT configuration; I'm unfamiliar with other architectures to give you pointers on those. BUT.... If your system is homebrew or if your processors are not like-kind, you may not be able to count on the two processors being able to snoop each others' writes; check with your hardware folks for advice.

like image 194
phonetagger Avatar answered Oct 19 '22 02:10

phonetagger


When you create a new thread, the construction of the thread synchronizes with the start of the thread function. That means you're good - you write to ThreadParameter before creating the thread, and the threads access it after they start, so you can be sure that the write happens before the read, and so the threads are guaranteed to see the correct value.

(The compiler is required to ensure that all writes done before the thread is started are visible within the new thread.)

like image 40
Alan Stokes Avatar answered Oct 19 '22 02:10

Alan Stokes


  1. Yes, it is safe.
  2. Don't know. Maybe : if( ThreadParameter = 5 ) StartThread();. However, in general, try not to second guess the compiler.
  3. Probably not. If you had to worry about such low levels details when writing code, then the logic that controls how a program gets executed on a multi-core machine is probably not doing its job very well.
  4. Boost is your friend for working with complex types in a multi-threaded environment.
like image 44
Carl Avatar answered Oct 19 '22 03:10

Carl