Recently I've peeked into the Linux kernel implementation of an atomic read and write and a few questions came up.
First the relevant code from the ia64 architecture:
typedef struct {
int counter;
} atomic_t;
#define atomic_read(v) (*(volatile int *)&(v)->counter)
#define atomic64_read(v) (*(volatile long *)&(v)->counter)
#define atomic_set(v,i) (((v)->counter) = (i))
#define atomic64_set(v,i) (((v)->counter) = (i))
For both read and write operations, it seems that the direct approach was taken to read from or write to the variable. Unless there is another trick somewhere, I do not understand what guarantees exist that this operation will be atomic in the assembly domain. I guess an obvious answer will be that such an operation translates to one assembly opcode, but even so, how is that guaranteed when taking into account the different memory cache levels (or other optimizations)?
On the read macros, the volatile type is used in a casting trick. Anyone has a clue how this affects the atomicity here? (Note that it is not used in the write operation)
The Linux kernel uses a large variety of "atomic" operations — operations that are indivisible as observed from anywhere within the system — to provide safe and efficient behavior in a multi-threaded environment.
During an atomic operation, a processor can read and write a location during the same data transmission. In this way, another input/output mechanism or processor cannot perform memory reading or writing tasks until the atomic operation has finished.
Atomic operations on the other hand provide instructions which complete in one instruction cycle. Since atomic instructions complete in one single instruction cycle, they are not interrupted by other CPUs trying to access the same memory location. This prevents race conditions.
Atomic operations are sequences of instructions that guarantee atomic accesses and updates of shared single word variables. This means that atomic operations cannot protect accesses to complex data structures in the way that locks can, but they provide a very efficient way of serializing access to a single word.
I think you are misunderstanding the (very much vague) usage of the word "atomic" and "volatile" here. Atomic only really means that the words will be read or written atomically (in one step, and guaranteeing that the contents of this memory position will always be one write or the other, and not something in between). And the volatile
keyword tells the compiler to never assume the data in that location due to an earlier read/write (basically, never optimize away the read).
What the words "atomic" and "volatile" do NOT mean here is that there's any form of memory synchronization. Neither implies ANY read/write barriers or fences. Nothing is guaranteed with regards to memory and cache coherence. These functions are basically atomic only at the software level, and the hardware can optimize/lie however it deems fit.
Now as to why simply reading is enough: the memory models for each architecture are different. Many architectures can guarantee atomic reads or writes for data aligned to a certain byte offset, or x words in length, etc. and vary from CPU to CPU. The Linux kernel contains many defines for the different architectures that let it do without any atomic calls (CMPXCHG
, basically) on platforms that guarantee (sometimes even only in practice even if in reality their spec says the don't actually guarantee) atomic reads/writes.
As for the volatile
, while there is no need for it in general unless you're accessing memory-mapped IO, it all depends on when/where/why the atomic_read
and atomic_write
macros are being called. Many compilers will (though it is not set in the C spec) generate memory barriers/fences for volatile variables (GCC, off the top of my head, is one. MSVC does for sure.). While this would normally mean that all reads/writes to this variable are now officially exempt from just about any compiler optimizations, in this case by creating a "virtual" volatile variable only this particular instance of a read/write is off-limits for optimization and re-ordering.
The reads are atomic on most major architectures, so long as they are aligned to a multiple of their size (and aren't bigger than the read size of a give type), see the Intel Architecture manuals. Writes on the other hand many be different, Intel states that under x86, single byte write and aligned writes may be atomic, under IPF (IA64), everything use acquire and release semantics, which would make it guaranteed atomic, see this.
the volatile
prevents the compiler from caching the value locally, forcing it to be retrieve where ever there is access to it.
If you write for a specific architecture, you can make assumptions specific to it.
I guess IA-64 does compile these things to a single instruction.
The cache shouldn't be an issue, unless the counter crosses a cache line boundry. But if 4/8 byte alignment is required, this can't happen.
A "real" atomic instruction is required when a machine instruction translates into two memory accesses. This is the case for increments (read, increment, write) or compare&swap.
volatile
affects the optimizations the compiler can do.
For example, it prevents the compiler from converting multiple reads into one read.
But on the machine instruction level, it does nothing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With