First, some context: I'm working with a pre-C11, inline-asm-based atomic model, but for the purposes of this I'm happy to ignore the C aspect (and any compiler barrier issues, which I can deal with separately) and consider it essentially just an asm/cpu-architecture question. Suppose I have code that looks like: <pre class="prettyprint"><code>various stores barrier store flag barrier </code></pre> I want to be able to read <code>flag</code> from another cpu core and conclude that the <code>various stores</code> were already performed and made visible. Is it possible to do so without any kind of memory barrier instruction on the loading side? Clearly it's possible at least on some cpu architectures, for example x86 where an explicit memory barrier is not needed on either core. But what about in general? Does it vary widely by cpu arch whether this is possible?

If a CPU were to reorder the loads, your code would require a load barrier in order to work correctly. There are plenty of architectures that do such reordering; see the table in Memory ordering for some examples. Thus in the general case your code does require load barriers. x86 is not very typical in that it provides pretty stringent memory ordering guarantees. See Who ordered memory fences on an x86? for a discussion.

Is it possible to use memory barriers only on the storing side

Tags:

cpu-architecture

c

assembly

memory-barriers

First, some context: I'm working with a pre-C11, inline-asm-based atomic model, but for the purposes of this I'm happy to ignore the C aspect (and any compiler barrier issues, which I can deal with separately) and consider it essentially just an asm/cpu-architecture question.

Suppose I have code that looks like:

various stores
barrier
store flag
barrier

I want to be able to read flag from another cpu core and conclude that the various stores were already performed and made visible. Is it possible to do so without any kind of memory barrier instruction on the loading side? Clearly it's possible at least on some cpu architectures, for example x86 where an explicit memory barrier is not needed on either core. But what about in general? Does it vary widely by cpu arch whether this is possible?

380

asked Oct 10 '14 05:10

R.. GitHub STOP HELPING ICE

1 Answers

If a CPU were to reorder the loads, your code would require a load barrier in order to work correctly. There are plenty of architectures that do such reordering; see the table in Memory ordering for some examples.

Thus in the general case your code does require load barriers.

x86 is not very typical in that it provides pretty stringent memory ordering guarantees. See Who ordered memory fences on an x86? for a discussion.

141

answered Oct 29 '22 16:10

NPE

Related questions
                            
                                Error handling in file opening
                            
                                "not stripped" but "no debug symbols"
                            
                                How to send data over a raw ethernet socket using sendto without using sockaddr_ll?
                            
                                How to reverse engineer C library?
                            
                                pthread_key_t vs local variable
                            
                                Using mergesort with presorted intervals
                            
                                Getting file descriptors and details within kernel space without open()
                            
                                Why are there 8 bytes between the end of a buffer and the saved frame pointer?
                            
                                Understanding Stack Frames in C
                            
                                malloc like function using custom heap
                            
                                Replacing the close() function in Linux with my own close() function
                            
                                Why is the time of computation of a matrix multiplication not constant?
                            
                                How to accept SSL connection in one process and reuse the same SSL context in another process
                            
                                Kdevelop steps and breakpoints not working
                            
                                how can I check a particular gcc feature in configure.ac
                            
                                How do signals interact with sequence points?
                            
                                Can I implement a fair "wait on multiple events" with just events, mutexes, and semaphores?
                            
                                MinGW GCC in Windows 7 x64 does not create an executable
                            
                                fwrite() adds garbage data to output (WINE & Windows 7, mingw & MSVC; NOT linux/gcc)
                            
                                C pointers to Matlab variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With