Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

volatile under the hood

I would like some help please to better understand a part of the following passage:
"The volatile keyword qualifier indicates that the variable can be changed outside of the program.For example, an external device may write data to a port. Compilers will sometimes temporarily use a cache, or register, to hold the value in a memory location for optimization purposes. If the external write modifies the memory location,then this change will not be reflected in the cached or register value." (It comes from the book: understanding and using c pointers, pg 178-179)

The ambiguity i have is between these phrases: "to hold the value in a memory location" and "If the external write modifies the memory location".

My problem is: I get the impression that if an external device writes data to port, that data will be stored to some location (???), then they will be stored to the register/cache (??) and then inside the variable of the c language source code. Something is misunderstood by me. From what i know the normal workflow should be: external device->small temporary buffer ->variable in RAM memory,(when data are going from a gadget to the MCU's RAM)

#define PORT 0xB0000000
unsigned int volatile * const port = (unsigned int*) PORT;
*port = 0x0BF4; // write to port
value = *port; // read from port
like image 470
Mynicks Avatar asked Feb 23 '17 15:02

Mynicks


2 Answers

Memory mapped I/O devices don't go through the CPU core's registers (or cache, typically). That's why they're external, they just hang somewhere on the memory bus, pretending to be memory.

So values from such a device will appear directly in what (to the CPU) looks like memory.

In the example you gave, this:

*port = 0x0BF4; // write to port

could perhaps cause an A/D converter to start a conversion, and this

value = *port; // read from port

could read in the resulting value. This is not a very typical design (A/D converters tend to be a bit more complicated than that, and so on) but it's possible.

If a compiler thought "hey, that there is just a read from a location to which this value was written" it might replace the two statements with

value = 0x0BF4; // "optimized", but broken since no more I/O occurs

This would ruin your day, if you were trying to read values from that A/D converter.

Declaring the location volatile tells the compiler to not make any assumptions about the side-effects of accesses to the location.

If you look at something like an STM32F4 ARM-based microcontroller, it has tons of memory-mapped I/O (serial ports, USB controller, Ethernet, timers, A/D and D/A converters, ... they're all there) plus a bunch of internal (to the core, but still memory-mapped) things.

like image 96
unwind Avatar answered Sep 23 '22 21:09

unwind


As stated by others, these are items that are external to the CPU core itself, it could be ram it could be a memory mapped peripheral (a uart status register lets say or a timer register, etc).

#define SOME_STATUS_REGA  (*((volatile unsigned int *)0x10008000))
void fun ( void )
{
    while(SOME_STATUS_REGA==0) continue;
}
#define SOME_STATUS_REGB  (*((unsigned int *)0x10008000))
void more_fun ( void )
{
    while(SOME_STATUS_REGB==0) continue;
}

with one target and toolchain produces

00000000 <fun>:
   0:   e59f200c    ldr r2, [pc, #12]   ; 14 <fun+0x14>
   4:   e5923000    ldr r3, [r2]
   8:   e3530000    cmp r3, #0
   c:   0afffffc    beq 4 <fun+0x4>
  10:   e12fff1e    bx  lr
  14:   10008000    andne   r8, r0, r0

00000018 <more_fun>:
  18:   e59f300c    ldr r3, [pc, #12]   ; 2c <more_fun+0x14>
  1c:   e5933000    ldr r3, [r3]
  20:   e3530000    cmp r3, #0
  24:   112fff1e    bxne    lr
  28:   eafffffe    b   28 <more_fun+0x10>
  2c:   10008000    andne   r8, r0, r0

you can see with the more_fun, not volatile case it reads the location one time does the comparison one time but goes into an infinite loop. The compiler has done what we told it to do since there is no way that variable can change there is no reason to burn clock cycles re-reading something that wont change so if it wasnt zero the first and only read it will never be zero so this falls into an infinite loop.

If you make it volatile you are "asking" the compiler to read or write it every time your code accesses it. Which you can see in the fun case, it goes back every time through the loop to read that address to see if it has changed. The volatile keyword is what made the difference between these two behaviors.

It doesnt have to be hardware that changes these values, if you use a global variable to communicate between an isr and foreground code then that variable in memory can be changed by the isr and/or by the foreground code so both need to treat it as volatile.

You also have the case of a multicore/multithreaded processor where each core/thread independently has access to shared resources. Not only do you need to use a volatile in that situation but you might need to have that ram not cached if the cores do not share the same cache and may have to have hardware and/or software locking if atomic operations are needed (ldrex/strex in the ARM world are the first step for that).

EDIT

Another demonstration, the problem is not only with reads, but with writes as well. Lets say you have a peripheral that you need to write a config register to setup some mode, then you write it again to enable it with that mode. or you have a hardware interface where each write increments some logic pointer and you do a series of writes to do something.

#define SOMETHING1 (*((volatile unsigned char *)0x10002000))
void fun ( void )
{
    SOMETHING1=5;
    SOMETHING1=5;
    SOMETHING1=6;
}
#define SOMETHING2 (*((unsigned char *)0x10002000))
void more_fun ( void )
{
    SOMETHING2=5;
    SOMETHING2=5;
    SOMETHING2=6;
}

without volatile, that peripheral is not going to operate properly. The multiple writes to the same pointer/address are considered dead code and optimized out.

00000000 <fun>:
   0:   e3a02005    mov r2, #5
   4:   e3a01006    mov r1, #6
   8:   e59f300c    ldr r3, [pc, #12]   ; 1c <fun+0x1c>
   c:   e5c32000    strb    r2, [r3]
  10:   e5c32000    strb    r2, [r3]
  14:   e5c31000    strb    r1, [r3]
  18:   e12fff1e    bx  lr
  1c:   10002000    andne   r2, r0, r0

00000020 <more_fun>:
  20:   e3a02006    mov r2, #6
  24:   e59f3004    ldr r3, [pc, #4]    ; 30 <more_fun+0x10>
  28:   e5c32000    strb    r2, [r3]
  2c:   e12fff1e    bx  lr
  30:   10002000    andne   r2, r0, r0

EDIT2

Clang/llvm demonstrates the problem as well

#define A (*((volatile unsigned char *)0x10002000))
void afun ( void )
{
    A = 4;
    A = 5;
    A = 6;
    A |= 1;
    while(A==0) continue;
}
#define B (*((unsigned char *)0x10002000))
void bfun ( void )
{
    B = 4;
    B = 5;
    B = 6;
    B |= 1;
    while(B==0) continue;
}

Producing

00000000 <afun>:
   0:   e3a00a02    mov r0, #8192   ; 0x2000
   4:   e3a01004    mov r1, #4
   8:   e3800201    orr r0, r0, #268435456  ; 0x10000000
   c:   e5c01000    strb    r1, [r0]
  10:   e3a01005    mov r1, #5
  14:   e5c01000    strb    r1, [r0]
  18:   e3a01006    mov r1, #6
  1c:   e5c01000    strb    r1, [r0]
  20:   e5d01000    ldrb    r1, [r0]
  24:   e3811001    orr r1, r1, #1
  28:   e5c01000    strb    r1, [r0]
  2c:   e5d01000    ldrb    r1, [r0]
  30:   e3510000    cmp r1, #0
  34:   0afffffc    beq 2c <afun+0x2c>
  38:   e12fff1e    bx  lr

0000003c <bfun>:
  3c:   e3a00a02    mov r0, #8192   ; 0x2000
  40:   e3a01007    mov r1, #7
  44:   e3800201    orr r0, r0, #268435456  ; 0x10000000
  48:   e5c01000    strb    r1, [r0]
  4c:   e12fff1e    bx  lr

Adding the volatile wont hurt you if you are doing onesy twosy things that are not in a domain that can optimize them out. (a single write to each register in some sequence, a single read of a register, single also implying no loops). It will most definitely hurt you if you are doing more than one write (which often happens when configuring a peripheral) doing a read modify write (x |= something, y &= something, z ^= something, etc).

If you are using a toolchain that doesnt have an optimizer or you choose not to optimize you wont have this problem, but that code is not portable if you leave the volatiles off, you will eventually run into trouble if you dont habitually deal with variables/code that crosses compile or other similar domains (hardware is a separate compile domain from software).

like image 35
old_timer Avatar answered Sep 24 '22 21:09

old_timer