What basically <code>__asm__ __volatile__ ()</code> does and what is significance of <code>"memory"</code> for ARM architecture?

<pre class="prettyprint"><code>asm volatile("" ::: "memory"); </code></pre> creates a compiler level memory barrier forcing optimizer to not re-order memory accesses across the barrier. For example, if you need to access some address in a specific order (probably because that memory area is actually backed by a different device rather than a memory) you need to be able tell this to the compiler otherwise it may just optimize your steps for the sake of efficiency. Assume in this scenario you must increment a value in address, read something and increment another value in an adjacent address. <pre class="prettyprint"><code>int c(int *d, int *e) { int r; d[0] += 1; r = e[0]; d[1] += 1; return r; } </code></pre> Problem is compiler (<code>gcc</code> in this case) can rearrange your memory access to get better performance if you ask for it (<code>-O</code>). Probably leading to a sequence of instructions like below: <pre class="prettyprint"><code>00000000 <c>: 0: 4603 mov r3, r0 2: c805 ldmia r0, {r0, r2} 4: 3001 adds r0, #1 6: 3201 adds r2, #1 8: 6018 str r0, [r3, #0] a: 6808 ldr r0, [r1, #0] c: 605a str r2, [r3, #4] e: 4770 bx lr </code></pre> Above values for <code>d[0]</code> and <code>d[1]</code> are loaded at the same time. Lets assume this is something you want to avoid then you need to tell compiler not to reorder memory accesses and that is to use <code>asm volatile("" ::: "memory")</code>. <pre class="prettyprint"><code>int c(int *d, int *e) { int r; d[0] += 1; r = e[0]; asm volatile("" ::: "memory"); d[1] += 1; return r; } </code></pre> So you'll get your instruction sequence as you want it to be: <pre class="prettyprint"><code>00000000 <c>: 0: 6802 ldr r2, [r0, #0] 2: 4603 mov r3, r0 4: 3201 adds r2, #1 6: 6002 str r2, [r0, #0] 8: 6808 ldr r0, [r1, #0] a: 685a ldr r2, [r3, #4] c: 3201 adds r2, #1 e: 605a str r2, [r3, #4] 10: 4770 bx lr 12: bf00 nop </code></pre> It should be noted that this is only compile time memory barrier to avoid compiler to reorder memory accesses, as it puts no extra hardware level instructions to flush memories or wait for load or stores to be completed. CPUs can still reorder memory accesses if they have the architectural capabilities and memory addresses are on <code>normal</code> type instead of <code>strongly ordered</code> or <code>device</code> (ref).

Working of asm volatile ("" : : : "memory")

1 Answers

asm volatile("" ::: "memory");

creates a compiler level memory barrier forcing optimizer to not re-order memory accesses across the barrier.

For example, if you need to access some address in a specific order (probably because that memory area is actually backed by a different device rather than a memory) you need to be able tell this to the compiler otherwise it may just optimize your steps for the sake of efficiency.

Assume in this scenario you must increment a value in address, read something and increment another value in an adjacent address.

int c(int *d, int *e) {         int r;         d[0] += 1;         r = e[0];         d[1] += 1;         return r; }

Problem is compiler (gcc in this case) can rearrange your memory access to get better performance if you ask for it (-O). Probably leading to a sequence of instructions like below:

00000000 <c>:    0:   4603        mov r3, r0    2:   c805        ldmia   r0, {r0, r2}    4:   3001        adds    r0, #1    6:   3201        adds    r2, #1    8:   6018        str r0, [r3, #0]    a:   6808        ldr r0, [r1, #0]    c:   605a        str r2, [r3, #4]    e:   4770        bx  lr

Above values for d[0] and d[1] are loaded at the same time. Lets assume this is something you want to avoid then you need to tell compiler not to reorder memory accesses and that is to use asm volatile("" ::: "memory").

int c(int *d, int *e) {         int r;         d[0] += 1;         r = e[0];         asm volatile("" ::: "memory");         d[1] += 1;         return r; }

So you'll get your instruction sequence as you want it to be:

00000000 <c>:    0:   6802        ldr r2, [r0, #0]    2:   4603        mov r3, r0    4:   3201        adds    r2, #1    6:   6002        str r2, [r0, #0]    8:   6808        ldr r0, [r1, #0]    a:   685a        ldr r2, [r3, #4]    c:   3201        adds    r2, #1    e:   605a        str r2, [r3, #4]   10:   4770        bx  lr   12:   bf00        nop

It should be noted that this is only compile time memory barrier to avoid compiler to reorder memory accesses, as it puts no extra hardware level instructions to flush memories or wait for load or stores to be completed. CPUs can still reorder memory accesses if they have the architectural capabilities and memory addresses are on normal type instead of strongly ordered or device (ref).

154

answered Sep 21 '22 12:09

auselen

Related questions
                            
                                What does a dot before the variable name in struct mean?
                            
                                Does &((struct name *)NULL -> b) cause undefined behaviour in C11?
                            
                                Does the C preprocessor strip comments or expand macros first? [duplicate]
                            
                                OpenMP: are local variables automatically private?
                            
                                Whats the difference between UInt8 and uint8_t
                            
                                Why is statically linking glibc discouraged?
                            
                                How to use #if inside #define in the C preprocessor?
                            
                                What do R_X86_64_32S and R_X86_64_64 relocation mean?
                            
                                'printf' with leading zeros in C
                            
                                Why is matrix multiplication faster with numpy than with ctypes in Python?
                            
                                How do you query a pthread to see if it is still running?
                            
                                How do I share a global variable between c files?
                            
                                %i or %d to print integer in C using printf()?
                            
                                Why does `int ;` compile fine in C, but not in C++?
                            
                                Convert javascript code to c code [closed]
                            
                                How to raise warning if return value is disregarded?
                            
                                String input to flex lexer
                            
                                Where are constant variables stored in C?
                            
                                Why is this an undefined behavior?
                            
                                How to specify 64 bit integers in c

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Working of asm volatile ("" : : : "memory")

Tags:

c

gcc

volatile

arm

embedded-linux

vnr1992

People also ask

1 Answers

auselen

Recent Activity

Donate For Us

Working of __asm__ __volatile__ ("" : : : "memory")

Tags:

c

gcc

volatile

arm

embedded-linux

vnr1992

People also ask

1 Answers

auselen

Related questions

Recent Activity

Donate For Us

Working of asm volatile ("" : : : "memory")