GCC offers a nice set of built-in functions for atomic operations. And being on MacOS or iOS, even Apple offers a nice set of atomic functions. However, all these functions perform an operation, e.g. an addition/subtraction, a logical operation (AND/OR/XOR) or a compare-and-set/compare-and-swap. What I am looking for is a way to atomically assign/read an <code>int</code> value, like: <pre class="prettyprint"><code>int a; /* ... */ a = someVariable; </code></pre> That's all. <code>a</code> will be read by another thread and it is only important that <code>a</code> either has its old value or its new value. Unfortunately the C standard does not guarantee that assigning or reading a value is an atomic operation. I remember that I once read somewhere, that writing or reading a value to a variable of type <code>int</code> is guaranteed to be atomic in GCC (regardless the size of int) but I searched everywhere on the GCC homepage and I cannot find this statement any longer (maybe it was removed). I cannot use <code>sig_atomic_t</code> because sig_atomic_t has no guaranteed size and it might also have a different size than <code>int</code>. Since only one thread will ever "write" a value to <code>a</code>, while both threads will "read" the current value of <code>a</code>, I don't need to perform the operations themselves in an atomic manner, e.g.: <pre class="prettyprint"><code>/* thread 1 */ someVariable = atomicRead(a); /* Do something with someVariable, non-atomic, when done */ atomicWrite(a, someVariable); /* thread 2 */ someVariable = atomicRead(a); /* Do something with someVariable, but never write to a */ </code></pre> If both threads were going to write to <code>a</code>, then all operations would have to be atomic, but that way, this may only waste CPU time; and we are extremely low on CPU resources in our project. So far we use a mutex around read/write operations of <code>a</code> and even though the mutex is held for such a tiny amount of time, this already causes problems (one of the threads is a realtime thread and blocking on a mutex causes it to fail its realtime constraints, which is pretty bad). Of course I could use a <code>__sync_fetch_and_add</code> to read the variable (and simply add "0" to it, to not modify its value) and for writing use a <code>__sync_val_compare_and_swap</code> for writing it (as I know its old value, so passing that in will make sure the value is always exchanged), but won't this add unnecessary overhead?

A <code>__sync_fetch_and_add</code> with a 0 argument is indeed the best bet if you want your load to be atomic and act as a memory barrier. Similarly, you can use an <code>and</code> with 0 or an <code>or</code> with -1 to store 0 and -1 atomically with a memory barrier. For writing, you can use <code>__sync_test_and_set</code> (actually an xchg operation) if an "acquire" barrier is enough, or if using Clang you can use <code>__sync_swap</code> (which is an xchg operation with a full barrier). However, in many cases that's overkill and you may prefer to add memory barriers manually. If you do not want the memory barrier, you can use a volatile load to atomically read/write a variable that is aligned and no wider than a word: <pre class="prettyprint"><code>#define __sync_access(x) (*(volatile __typeof__(x) *) &(x)) </code></pre> (This macro is an lvalue, so you can also use it for a store like <code>__sync_store(x) = 0</code>). The function implements the same semantics as the C++11 <code>memory_order_consume</code> form, but only under two assumptions: <ul> <li>that your machine has coherent caches; if not, you need a memory barrier or global cache flush before the load (or before the first of a group of load).</li> <li>that your machine is not a DEC Alpha. The Alpha had very relaxed semantics for reordering memory accesses, so on it you'd need a memory barrier after the load (and after each load in a group of loads). On the Alpha the above macro only provides <code>memory_order_relaxed</code> semantics. BTW, the first versions of the Alpha couldn't even store a byte atomically (only a word, which was 8 bytes).</li> </ul> In either case, the <code>__sync_fetch_and_add</code> would work. As far as I know, no other machine imitated the Alpha so neither assumption should pose problems on current computers.

Atomically read/write int value w/o additional operation on the int value itself

Tags:

macos

ios

multithreading

gcc

atomic

GCC offers a nice set of built-in functions for atomic operations. And being on MacOS or iOS, even Apple offers a nice set of atomic functions. However, all these functions perform an operation, e.g. an addition/subtraction, a logical operation (AND/OR/XOR) or a compare-and-set/compare-and-swap. What I am looking for is a way to atomically assign/read an int value, like:

int a;
/* ... */    
a = someVariable;

That's all. a will be read by another thread and it is only important that a either has its old value or its new value. Unfortunately the C standard does not guarantee that assigning or reading a value is an atomic operation. I remember that I once read somewhere, that writing or reading a value to a variable of type int is guaranteed to be atomic in GCC (regardless the size of int) but I searched everywhere on the GCC homepage and I cannot find this statement any longer (maybe it was removed).

I cannot use sig_atomic_t because sig_atomic_t has no guaranteed size and it might also have a different size than int.

Since only one thread will ever "write" a value to a, while both threads will "read" the current value of a, I don't need to perform the operations themselves in an atomic manner, e.g.:

/* thread 1 */
someVariable = atomicRead(a);
/* Do something with someVariable, non-atomic, when done */
atomicWrite(a, someVariable);

/* thread 2 */
someVariable = atomicRead(a);
/* Do something with someVariable, but never write to a */

If both threads were going to write to a, then all operations would have to be atomic, but that way, this may only waste CPU time; and we are extremely low on CPU resources in our project. So far we use a mutex around read/write operations of a and even though the mutex is held for such a tiny amount of time, this already causes problems (one of the threads is a realtime thread and blocking on a mutex causes it to fail its realtime constraints, which is pretty bad).

Of course I could use a __sync_fetch_and_add to read the variable (and simply add "0" to it, to not modify its value) and for writing use a __sync_val_compare_and_swap for writing it (as I know its old value, so passing that in will make sure the value is always exchanged), but won't this add unnecessary overhead?

834

asked Aug 22 '11 14:08

Mecki

1 Answers

A __sync_fetch_and_add with a 0 argument is indeed the best bet if you want your load to be atomic and act as a memory barrier. Similarly, you can use an and with 0 or an or with -1 to store 0 and -1 atomically with a memory barrier. For writing, you can use __sync_test_and_set (actually an xchg operation) if an "acquire" barrier is enough, or if using Clang you can use __sync_swap (which is an xchg operation with a full barrier).

However, in many cases that's overkill and you may prefer to add memory barriers manually. If you do not want the memory barrier, you can use a volatile load to atomically read/write a variable that is aligned and no wider than a word:

#define __sync_access(x) (*(volatile __typeof__(x) *) &(x))

(This macro is an lvalue, so you can also use it for a store like __sync_store(x) = 0). The function implements the same semantics as the C++11 memory_order_consume form, but only under two assumptions:

that your machine has coherent caches; if not, you need a memory barrier or global cache flush before the load (or before the first of a group of load).
that your machine is not a DEC Alpha. The Alpha had very relaxed semantics for reordering memory accesses, so on it you'd need a memory barrier after the load (and after each load in a group of loads). On the Alpha the above macro only provides memory_order_relaxed semantics. BTW, the first versions of the Alpha couldn't even store a byte atomically (only a word, which was 8 bytes).

In either case, the __sync_fetch_and_add would work. As far as I know, no other machine imitated the Alpha so neither assumption should pose problems on current computers.

175

answered Sep 27 '22 21:09

Paolo Bonzini

Related questions
                            
                                iOS Flutter not getting push notifications
                            
                                Why isn't my multichannel mapping working correctly?
                            
                                SwiftUI Italic text clipping
                            
                                Error adding Apple Developer Account: authentication service is unavailable
                            
                                Single Image picker in Flutter on iOS without permission
                            
                                _validateTextureView:557: failed assertion `Texture View Validation cannot create View from Memoryless texture
                            
                                SwiftUI @FocusState - how to give it initial value
                            
                                How can I add HTTP request caching to an application using ASIHTTPRequests?
                            
                                textFieldShouldEndEditing called multiple times
                            
                                Dynamically sizing a UIWebView based on it's content font size
                            
                                iOS - In-App Purchase & applicationDidBecomeActive
                            
                                Custom container view controller
                            
                                Target iPhone application by model (e.g. 3G vs 3GS)
                            
                                How to check if not available methods are used if deployment target < base sdk?
                            
                                auto upgrading iOS apps
                            
                                Improve CSS3 speed on iPad
                            
                                didRotateFromInterfaceOrientation not firing when rotating? [duplicate]
                            
                                What are the advantages / disadvantages of Bing maps over Map Kit on iOS
                            
                                the Benifits of awakeFromNib?
                            
                                How do I transfer data between threads in iOS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With