Recently, I am reading some Linux kernel space codes, I see this
uint64_t used;
uint64_t blocked;
used = atomic64_read(&g_variable->used); //#1
barrier(); //#2
blocked = atomic64_read(&g_variable->blocked); //#3
What is the semantics of this code snippet? Does it make sure #1 executes before #3 by #2. But I am a litter bit confused, becasue
#A In 64 bit platform, atomic64_read macro is expanded to
used = (&g_variable->used)->counter // where counter is volatile.
In 32 bits platform, it was converted to use lock cmpxchg8b. I assume these two have the same semantic, and for 64 bits version, I think it means:
atomic64_read doesn't have semantic for preserve read ordering!!! see this
#B the barrier macro is defined as
/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")
From the wiki this just prevents gcc compiler from reordering read and write.
What i am confused is how does it disable reorder optimization for CPU? In addition, can i think barrier macro is full fence?
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
The memory barrier instructions halt execution of the application code until a memory write of an instruction has finished executing. They are used to ensure that a critical section of code has been completed before continuing execution of the application code.
A write memory barrier gives a guarantee that all the STORE operations specified before the barrier will appear to happen before all the STORE operations specified after the barrier with respect to the other components of the system.
Creates a hardware memory barrier (fence) that prevents the CPU from re-ordering read and write operations. It may also prevent the compiler from re-ordering read and write operations.
32-bit x86 processors don't provide simple atomic read operations for 64-bit types. The only atomic operation on 64-bit types on such CPUs that deals with "normal" registers is LOCK CMPXCHG8B
, which is why it is used here. The alternative is to use MOVQ
and MMX/XMM registers, but that requires knowledge of the FPU state/registers, and requires that all operations on that value are done with the MMX/XMM instructions.
On 64-bit x86_64 processors, aligned reads of 64-bit types are atomic, and can be done with a MOV
instruction, so only a plain read is required --- the use of volatile
is just to ensure that the compiler actually does a read, and doesn't cache a previous value.
As for the read ordering, the inline assembler you quote ensures that the compiler emits the instructions in the right order, and this is all that is required on x86/x86_64 CPUs, provided the writes are correctly sequenced. LOCK
ed writes on x86 have a total ordering; plain MOV
writes provide "causal consistency", so if thread A does x=1
then y=2
, if thread B reads y==2
then a subsequent read of x
will see x==1
.
On IA-64, PowerPC, SPARC, and other processors with a more relaxed memory model there may well be more to atomic64_read()
and barrier()
.
x86 CPUs don’t do read-after-read reordering, so it is sufficient to prevent the compiler from doing any reordering. On other platforms such as PowerPC, things will look a lot different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With