Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does __asm__ __volatile__ do in C?

I looked into some C code from
http://www.mcs.anl.gov/~kazutomo/rdtsc.html
They use stuff like __inline__, __asm__ etc like the following:

code1:

static __inline__ tick gettick (void) {     unsigned a, d;     __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) );     return (((tick)a) | (((tick)d) << 32)); } 

code2:

volatile int  __attribute__((noinline)) foo2 (int a0, int a1) {     __asm__ __volatile__ (""); } 

I was wondering what does the code1 and code2 do?

(Editor's note: for this specific RDTSC use case, intrinsics are preferred: How to get the CPU cycle count in x86_64 from C++? See also https://gcc.gnu.org/wiki/DontUseInlineAsm)

like image 220
user3692521 Avatar asked Oct 19 '14 23:10

user3692521


People also ask

What is __ asm in C?

The __asm keyword invokes the inline assembler and can appear wherever a C or C++ statement is legal. It cannot appear by itself. It must be followed by an assembly instruction, a group of instructions enclosed in braces, or, at the very least, an empty pair of braces.

What is asm memory?

asm volatile("" ::: "memory"); creates a compiler level memory barrier forcing optimizer to not re-order memory accesses across the barrier.

Is asm a keyword in C?

The asm keyword allows you to embed assembler instructions within C code. GCC provides two forms of inline asm statements. A basic asm statement is one with no operands (see Basic Asm), while an extended asm statement (see Extended Asm) includes one or more operands.

What is volatile C?

A volatile keyword in C is nothing but a qualifier that is used by the programmer when they declare a variable in source code. It is used to inform the compiler that the variable value can be changed any time without any task given by the source code. Volatile is usually applied to a variable when we are declaring it.


1 Answers

The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached.

This is useful for the rdtsc instruction like so:

__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ) 

This takes no dependencies, so the compiler might assume the value can be cached. Volatile is used to force it to read a fresh timestamp.

When used alone, like this:

__asm__ __volatile__ ("") 

It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions:

__asm__ __volatile__ ("":::"memory") 

The rdtsc instruction is a good example for volatile. rdtsc is usually used when you need to time how long some instructions take to execute. Imagine some code like this, where you want to time r1 and r2's execution:

__asm__ ("rdtsc": "=a" (a0), "=d" (d0) ) r1 = x1 + y1; __asm__ ("rdtsc": "=a" (a1), "=d" (d1) ) r2 = x2 + y2; __asm__ ("rdtsc": "=a" (a2), "=d" (d2) ) 

Here the compiler is actually allowed to cache the timestamp, and valid output might show that each line took exactly 0 clocks to execute. Obviously this isn't what you want, so you introduce __volatile__ to prevent caching:

__asm__ __volatile__("rdtsc": "=a" (a0), "=d" (d0)) r1 = x1 + y1; __asm__ __volatile__("rdtsc": "=a" (a1), "=d" (d1)) r2 = x2 + y2; __asm__ __volatile__("rdtsc": "=a" (a2), "=d" (d2)) 

Now you'll get a new timestamp each time, but it still has a problem that both the compiler and the CPU are allowed to reorder all of these statements. It could end up executing the asm blocks after r1 and r2 have already been calculated. To work around this, you'd add some barriers that force serialization:

__asm__ __volatile__("mfence;rdtsc": "=a" (a0), "=d" (d0) :: "memory") r1 = x1 + y1; __asm__ __volatile__("mfence;rdtsc": "=a" (a1), "=d" (d1) :: "memory") r2 = x2 + y2; __asm__ __volatile__("mfence;rdtsc": "=a" (a2), "=d" (d2) :: "memory") 

Note the mfence instruction here, which enforces a CPU-side barrier, and the "memory" specifier in the volatile block which enforces a compile-time barrier. On modern CPUs, you can replace mfence:rdtsc with rdtscp for something more efficient.

like image 198
Cory Nelson Avatar answered Sep 30 '22 02:09

Cory Nelson