I looked into some C code from http://www.mcs.anl.gov/~kazutomo/rdtsc.html They use stuff like <code>__inline__</code>, <code>__asm__</code> etc like the following: code1: <pre class="prettyprint"><code>static __inline__ tick gettick (void) { unsigned a, d; __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ); return (((tick)a) | (((tick)d) << 32)); } </code></pre> code2: <pre class="prettyprint"><code>volatile int __attribute__((noinline)) foo2 (int a0, int a1) { __asm__ __volatile__ (""); } </code></pre> I was wondering what does the code1 and code2 do? (Editor's note: for this specific RDTSC use case, intrinsics are preferred: How to get the CPU cycle count in x86_64 from C++? See also https://gcc.gnu.org/wiki/DontUseInlineAsm)

The <code>__volatile__</code> modifier on an <code>__asm__</code> block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached. This is useful for the <code>rdtsc</code> instruction like so: <pre class="prettyprint"><code>__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ) </code></pre> This takes no dependencies, so the compiler might assume the value can be cached. Volatile is used to force it to read a fresh timestamp. When used alone, like this: <pre class="prettyprint"><code>__asm__ __volatile__ ("") </code></pre> It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions: <pre class="prettyprint"><code>__asm__ __volatile__ ("":::"memory") </code></pre> The <code>rdtsc</code> instruction is a good example for volatile. <code>rdtsc</code> is usually used when you need to time how long some instructions take to execute. Imagine some code like this, where you want to time <code>r1</code> and <code>r2</code>'s execution: <pre class="prettyprint"><code>__asm__ ("rdtsc": "=a" (a0), "=d" (d0) ) r1 = x1 + y1; __asm__ ("rdtsc": "=a" (a1), "=d" (d1) ) r2 = x2 + y2; __asm__ ("rdtsc": "=a" (a2), "=d" (d2) ) </code></pre> Here the compiler is actually allowed to cache the timestamp, and valid output might show that each line took exactly 0 clocks to execute. Obviously this isn't what you want, so you introduce <code>__volatile__</code> to prevent caching: <pre class="prettyprint"><code>__asm__ __volatile__("rdtsc": "=a" (a0), "=d" (d0)) r1 = x1 + y1; __asm__ __volatile__("rdtsc": "=a" (a1), "=d" (d1)) r2 = x2 + y2; __asm__ __volatile__("rdtsc": "=a" (a2), "=d" (d2)) </code></pre> Now you'll get a new timestamp each time, but it still has a problem that both the compiler and the CPU are allowed to reorder all of these statements. It could end up executing the asm blocks after r1 and r2 have already been calculated. To work around this, you'd add some barriers that force serialization: <pre class="prettyprint"><code>__asm__ __volatile__("mfence;rdtsc": "=a" (a0), "=d" (d0) :: "memory") r1 = x1 + y1; __asm__ __volatile__("mfence;rdtsc": "=a" (a1), "=d" (d1) :: "memory") r2 = x2 + y2; __asm__ __volatile__("mfence;rdtsc": "=a" (a2), "=d" (d2) :: "memory") </code></pre> Note the <code>mfence</code> instruction here, which enforces a CPU-side barrier, and the "memory" specifier in the volatile block which enforces a compile-time barrier. On modern CPUs, you can replace <code>mfence:rdtsc</code> with <code>rdtscp</code> for something more efficient.

What does asm volatile do in C?

Tags:

c

gcc

inline-assembly

I looked into some C code from
http://www.mcs.anl.gov/~kazutomo/rdtsc.html
They use stuff like __inline__, __asm__ etc like the following:

code1:

static __inline__ tick gettick (void) {     unsigned a, d;     __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) );     return (((tick)a) | (((tick)d) << 32)); }

code2:

volatile int  __attribute__((noinline)) foo2 (int a0, int a1) {     __asm__ __volatile__ (""); }

I was wondering what does the code1 and code2 do?

(Editor's note: for this specific RDTSC use case, intrinsics are preferred: How to get the CPU cycle count in x86_64 from C++? See also https://gcc.gnu.org/wiki/DontUseInlineAsm)

220

asked Oct 19 '14 23:10

user3692521

1 Answers

The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached.

This is useful for the rdtsc instruction like so:

__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) )

This takes no dependencies, so the compiler might assume the value can be cached. Volatile is used to force it to read a fresh timestamp.

When used alone, like this:

__asm__ __volatile__ ("")

It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions:

__asm__ __volatile__ ("":::"memory")

The rdtsc instruction is a good example for volatile. rdtsc is usually used when you need to time how long some instructions take to execute. Imagine some code like this, where you want to time r1 and r2's execution:

__asm__ ("rdtsc": "=a" (a0), "=d" (d0) ) r1 = x1 + y1; __asm__ ("rdtsc": "=a" (a1), "=d" (d1) ) r2 = x2 + y2; __asm__ ("rdtsc": "=a" (a2), "=d" (d2) )

Here the compiler is actually allowed to cache the timestamp, and valid output might show that each line took exactly 0 clocks to execute. Obviously this isn't what you want, so you introduce __volatile__ to prevent caching:

__asm__ __volatile__("rdtsc": "=a" (a0), "=d" (d0)) r1 = x1 + y1; __asm__ __volatile__("rdtsc": "=a" (a1), "=d" (d1)) r2 = x2 + y2; __asm__ __volatile__("rdtsc": "=a" (a2), "=d" (d2))

Now you'll get a new timestamp each time, but it still has a problem that both the compiler and the CPU are allowed to reorder all of these statements. It could end up executing the asm blocks after r1 and r2 have already been calculated. To work around this, you'd add some barriers that force serialization:

__asm__ __volatile__("mfence;rdtsc": "=a" (a0), "=d" (d0) :: "memory") r1 = x1 + y1; __asm__ __volatile__("mfence;rdtsc": "=a" (a1), "=d" (d1) :: "memory") r2 = x2 + y2; __asm__ __volatile__("mfence;rdtsc": "=a" (a2), "=d" (d2) :: "memory")

Note the mfence instruction here, which enforces a CPU-side barrier, and the "memory" specifier in the volatile block which enforces a compile-time barrier. On modern CPUs, you can replace mfence:rdtsc with rdtscp for something more efficient.

198

answered Sep 30 '22 02:09

Cory Nelson

Related questions
                            
                                Smart pointers/safe memory management for C?
                            
                                What does this expression mean, and why does it compile? [duplicate]
                            
                                How to dynamically allocate memory space for a string and get that string from user?
                            
                                What's the meaning of the %m formatting specifier?
                            
                                Linker error: "linker input file unused because linking not done", undefined reference to a function in that file
                            
                                Understanding the difference between f() and f(void) in C and C++ once and for all
                            
                                String termination - char c=0 vs char c='\0'
                            
                                What's the meaning of "reserved for any use"?
                            
                                Is returning a pointer to a static local variable safe?
                            
                                How to make an HTTP get request in C without libcurl?
                            
                                SSE intrinsic functions reference
                            
                                getc() vs fgetc() - What are the major differences?
                            
                                Will a `char` always-always-always have 8 bits?
                            
                                Is there a REPL for C programming? [closed]
                            
                                Increasing camera capture resolution in OpenCV
                            
                                In a C function declaration, what does "..." as the last parameter do?
                            
                                variably modified array at file scope in C
                            
                                What primitive data type is time_t? [duplicate]
                            
                                Why is gcc allowed to speculatively load from a struct?
                            
                                `getchar()` gives the same output as the input string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does asm volatile do in C?

Tags:

c

gcc

inline-assembly

user3692521

People also ask

1 Answers

Cory Nelson

Recent Activity

Donate For Us

What does __asm__ __volatile__ do in C?

Tags:

c

gcc

inline-assembly

user3692521

People also ask

1 Answers

Cory Nelson

Related questions

Recent Activity

Donate For Us

What does asm volatile do in C?