Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need an atomic if a value is only written?

Suppose I have several threads accessing the same memory location. And, if at all, they all write the same value and none of them reads it. After that, all threads converge (through locks) and only then I read the value. Do I need to use an atomic for this? This is for an x86_64 system. The value is an int32.

like image 905
Dany Bittel Avatar asked Aug 25 '20 17:08

Dany Bittel


People also ask

Do booleans need to be atomic?

You need atomic<bool> to avoid race-conditions. A race-condition occurs if two threads access the same memory location, and at least one of them is a write operation. If your program contains race-conditions, the behavior is undefined.

What does it mean for an instruction to be atomic?

From the Greek meaning "not divisible into smaller parts" An "atomic" operation is always observed to be done or not done, but never halfway done. An atomic operation must be performed entirely or not performed at all.

What does not atomic mean?

a : not relating to, concerned with, or composed of atoms Gerald Cleaver, professor and graduate program director in Baylorʼs department of physics, will present "Life on the Landscape," which will consider the place of the Earthʼs universe and the possibility of nonatomic-based (intelligent) life forms outside of it. ...

What makes an operation atomic?

"An operation acting on shared memory is atomic if it completes in a single step relative to other threads. When an atomic store is performed on a shared memory, no other thread can observe the modification half-complete.


1 Answers

According to §5.1.2.4 ¶25 and ¶4 of the ISO C11 standard, two different threads writing to the same memory location using non-atomic operations in an unordered fashion causes undefined behavior. The ISO C standard makes no exception to this rule if all threads are writing the same value.

Although writing a 32-bit integer to a 4-byte aligned address is guaranteed to be atomic by the Intel/AMD specifications for x86/x64 CPUs, such an operation is not guaranteed to be atomic by the ISO C standard, unless you are using a data type that is guaranteed to be atomic by the ISO C standard (such as atomic_int_least32_t). Therefore, even if your threads write a value of type int32_t to a 4-byte aligned address, according to the ISO C standard, your program will still cause undefined behavior.

However, for practical purposes, it is probably safe to assume that the compiler is generating assembly instructions that perform the operation atomically, provided that the alignment requirements are met.

Even if the memory writes were not aligned and the CPU wouldn't execute the write instructions atomically, it is likely that your program will still work as intended. It should not matter if a write operation is split up into two write operations, because all threads are writing the exact same value.

If you decide not to use an atomic variable, then you should at least declare the variable as volatile. Otherwise, the compiler may emit assembly instructions that cause the variable to be only stored in a CPU register, so that the other CPUs may never see any changes to that variable.

So, to answer your question: It is probably not necessary to declare your variable as atomic. However, it is still highly recommended. Generally, all operations on variables that are accessed by several threads should either be atomic or be protected by a mutex. The only exception to this rule is if all threads are performing read-only operations on this variable.

Playing around with undefined behavior can be dangerous and is generally not recommended. In particular, if the compiler detects code that causes undefined behavior, it is allowed to treat that code as unreachable and optimize it away. In certain situations, some compilers actually do that. See this very interesting post by Microsoft Blogger Raymond Chen for more information.

Also, beware that several threads writing to the same location (or even the same cache line) can disrupt the CPU pipeline, because the x86/x64 architecture guarantees strong memory ordering which must be enforced. If the CPU's cache coherency protocol detects a possible memory order violation due to another CPU writing to the same cache line, the whole CPU pipeline may have to be cleared. For this reason, it may be more efficient for all threads to write to different memory locations (in different cache lines, at least 64 bytes apart) and to analyze the written data after all threads have been synchronized.

like image 172
Andreas Wenzel Avatar answered Oct 31 '22 07:10

Andreas Wenzel