Read and Write atomic operation implementation in the Linux Kernel

Tags:

Recently I've peeked into the Linux kernel implementation of an atomic read and write and a few questions came up.

First the relevant code from the ia64 architecture:

typedef struct {
    int counter;
} atomic_t;

#define atomic_read(v)      (*(volatile int *)&(v)->counter)
#define atomic64_read(v)    (*(volatile long *)&(v)->counter)

#define atomic_set(v,i)     (((v)->counter) = (i))
#define atomic64_set(v,i)   (((v)->counter) = (i))

For both read and write operations, it seems that the direct approach was taken to read from or write to the variable. Unless there is another trick somewhere, I do not understand what guarantees exist that this operation will be atomic in the assembly domain. I guess an obvious answer will be that such an operation translates to one assembly opcode, but even so, how is that guaranteed when taking into account the different memory cache levels (or other optimizations)?
On the read macros, the volatile type is used in a casting trick. Anyone has a clue how this affects the atomicity here? (Note that it is not used in the write operation)

273

asked Feb 15 '12 07:02

EdwardH

3 Answers

I think you are misunderstanding the (very much vague) usage of the word "atomic" and "volatile" here. Atomic only really means that the words will be read or written atomically (in one step, and guaranteeing that the contents of this memory position will always be one write or the other, and not something in between). And the volatile keyword tells the compiler to never assume the data in that location due to an earlier read/write (basically, never optimize away the read).

What the words "atomic" and "volatile" do NOT mean here is that there's any form of memory synchronization. Neither implies ANY read/write barriers or fences. Nothing is guaranteed with regards to memory and cache coherence. These functions are basically atomic only at the software level, and the hardware can optimize/lie however it deems fit.

Now as to why simply reading is enough: the memory models for each architecture are different. Many architectures can guarantee atomic reads or writes for data aligned to a certain byte offset, or x words in length, etc. and vary from CPU to CPU. The Linux kernel contains many defines for the different architectures that let it do without any atomic calls (CMPXCHG, basically) on platforms that guarantee (sometimes even only in practice even if in reality their spec says the don't actually guarantee) atomic reads/writes.

As for the volatile, while there is no need for it in general unless you're accessing memory-mapped IO, it all depends on when/where/why the atomic_read and atomic_write macros are being called. Many compilers will (though it is not set in the C spec) generate memory barriers/fences for volatile variables (GCC, off the top of my head, is one. MSVC does for sure.). While this would normally mean that all reads/writes to this variable are now officially exempt from just about any compiler optimizations, in this case by creating a "virtual" volatile variable only this particular instance of a read/write is off-limits for optimization and re-ordering.

answered Nov 16 '22 02:11

Mahmoud Al-Qudsi

The reads are atomic on most major architectures, so long as they are aligned to a multiple of their size (and aren't bigger than the read size of a give type), see the Intel Architecture manuals. Writes on the other hand many be different, Intel states that under x86, single byte write and aligned writes may be atomic, under IPF (IA64), everything use acquire and release semantics, which would make it guaranteed atomic, see this.

the volatile prevents the compiler from caching the value locally, forcing it to be retrieve where ever there is access to it.

answered Nov 16 '22 00:11

Necrolis

If you write for a specific architecture, you can make assumptions specific to it.
I guess IA-64 does compile these things to a single instruction.

The cache shouldn't be an issue, unless the counter crosses a cache line boundry. But if 4/8 byte alignment is required, this can't happen.

A "real" atomic instruction is required when a machine instruction translates into two memory accesses. This is the case for increments (read, increment, write) or compare&swap.

volatile affects the optimizations the compiler can do.
For example, it prevents the compiler from converting multiple reads into one read.
But on the machine instruction level, it does nothing.

answered Nov 16 '22 02:11

ugoren

Related questions
                            
                                How to embed a file into an executable file?
                            
                                How can I know what type of debug info is in an ELF object file?
                            
                                Getting output of a system command from stdout in C
                            
                                Can Uncrustify be prevented from modifying certain sections of code?
                            
                                Why is zero padding needed in sockaddr_in?
                            
                                Why should I use 'rdtsc' differently on x86 and x86_x64?
                            
                                How does this ❤ code work?
                            
                                Why is void main() so popular?
                            
                                CUDA __device__ Unresolved extern function [duplicate]
                            
                                Is scanf("%d%d", &x, &x) well defined?
                            
                                When is it more appropriate to use valloc() as opposed to malloc()?
                            
                                Is is necessary to use volatile when writing to hardware in C or C++?
                            
                                "volatile" qualifier and compiler reorderings
                            
                                Will C++0x support __stdcall or extern "C" capture-nothing lambdas?
                            
                                What are the benefits (and drawbacks) of a weakly typed language?
                            
                                How do I include only used symbols when statically linking with gcc?
                            
                                how to write PHP module in C
                            
                                error C2733 second C linkage of overloaded function 'function' not allowed
                            
                                _mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA
                            
                                Library for gradient boosting tree

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read and Write atomic operation implementation in the Linux Kernel

Tags:

c

atomic

linux-kernel

volatile