In a codebase I reviewed, I found the following idiom. <pre class="prettyprint"><code>void notify(struct actor_t act) { write(act.pipe, "M", 1); } // thread A sending data to thread B void send(byte *data) { global.data = data; notify(threadB); } // in thread B event loop read(this.sock, &cmd, 1); switch (cmd) { case 'M': use_data(global.data);break; ... } </code></pre> "Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that <code>global.data</code> will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail". The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "Listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had a bug with this idiom". "But, it says in the book..." "Quiet!", he hushed me promptly, "Maybe theoretically, it's not guaranteed, but in practice, the fact you used a function call is effectively a memory barrier. The compiler will not reorder the instruction <code>global.data = data</code>, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assured, we have ample real world problems to worry about. We don't need to invest extra effort in bogus theoretical problems. "Rest assured my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non-problems." Is he correct? Is that really a non-issue in practice (say x86, x64 and ARM)? It's against everything I learned, but he does have a long beard and a really smart looks! Extra points if you can show me a piece of code proving him wrong!

Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too. If your shared data variable is declared as <code>volatile</code>, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using <code>volatile</code>, nor what it was meant for. (If I had any votes left, I'd +1 your question for the narration.)

Is function call an effective memory barrier for modern platforms?

Tags:

c

multithreading

memory-barriers

In a codebase I reviewed, I found the following idiom.

void notify(struct actor_t act) {     write(act.pipe, "M", 1); } // thread A sending data to thread B void send(byte *data) {     global.data = data;     notify(threadB); } // in thread B event loop read(this.sock, &cmd, 1); switch (cmd) {     case 'M': use_data(global.data);break;     ... }

"Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail".

The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "Listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had a bug with this idiom".

"But, it says in the book..."

"Quiet!", he hushed me promptly, "Maybe theoretically, it's not guaranteed, but in practice, the fact you used a function call is effectively a memory barrier. The compiler will not reorder the instruction global.data = data, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assured, we have ample real world problems to worry about. We don't need to invest extra effort in bogus theoretical problems.

"Rest assured my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non-problems."

Is he correct? Is that really a non-issue in practice (say x86, x64 and ARM)?

It's against everything I learned, but he does have a long beard and a really smart looks!

Extra points if you can show me a piece of code proving him wrong!

936

asked May 22 '12 08:05

mikebloch

2 Answers

Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.

If your shared data variable is declared as volatile, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile, nor what it was meant for.

(If I had any votes left, I'd +1 your question for the narration.)

109

answered Sep 23 '22 01:09

Mahmoud Al-Qudsi

In practice, a function call is a compiler barrier, meaning that the compiler will not move global memory accesses past the call. A caveat to this is functions which the compiler knows something about, e.g. builtins, inlined functions (keep in mind IPO!) etc.

So a processor memory barrier (in addition to a compiler barrier) is in theory needed to make this work. However, since you're calling read and write which are syscalls that change the global state, I'm quite sure that the kernel issues memory barriers somewhere in the implementation of those. There is no such guarantee though, so in theory you need the barriers.

answered Sep 24 '22 01:09

janneb

Related questions
                            
                                Can I use Intel syntax of x86 assembly with GCC?
                            
                                How do you read a segfault kernel log message
                            
                                Providing/passing argument to signal handler
                            
                                Why is this code using strlen heavily 6.5x slower with GCC optimizations enabled?
                            
                                Why is memcmp(a, b, 4) only sometimes optimized to a uint32 comparison?
                            
                                What is the difference between returning a char* and a char[] from a function? [duplicate]
                            
                                How are exceptions implemented under the hood? [closed]
                            
                                How to detect if the current process is being run by GDB
                            
                                Why can a string be assigned to a char* pointer, but not to a char[] array?
                            
                                Best way to check if a character array is empty
                            
                                Setting std=c99 flag in GCC
                            
                                How do I use setsockopt(SO_REUSEADDR)?
                            
                                Copying one structure to another
                            
                                Order of execution for an if with multiple conditionals
                            
                                Aligning to cache line and knowing the cache line size
                            
                                File Operations in Android NDK
                            
                                How to get the number of CPUs in Linux using C?
                            
                                What is the simplest standard conform way to produce a Segfault in C?
                            
                                clock_gettime alternative in Mac OS X
                            
                                How do I check if an integer is even or odd using bitwise operators

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With