I've seen some flavors of these question around and I've seen mixed answers, still unsure whether they are up-to-date and fully apply to my use case, so I'll ask here. Do let me know if it's a duplicate! Given that I'm developing for STM32 microcontrollers (bare-metal) using C++17 and the gcc-arm-none-eabi-9 toolchain: Do I still need to use <code>volatile</code> for sharing data between an ISR and <code>main()</code>? <pre class="prettyprint"><code>volatile std::int32_t flag = 0; extern "C" void ISR() { flag = 1; } int main() { while (!flag) { ... } } </code></pre> It's clear to me that I should always use <code>volatile</code> for accessing memory-mapped HW registers. However for the ISR use case I don't know if it can be considered a case of "multithreading" or not. In that case, people recommend using C++11's new threading features (e.g. <code>std::atomic</code>). I'm aware of the difference between <code>volatile</code> (don't optimize) and <code>atomic</code> (safe access), so the answers suggesting <code>std::atomic</code> confuse me here. For the case of "real" multithreading on x86 systems I haven't seen the need to use <code>volatile</code>. In other words: can the compiler know that <code>flag</code> can change inside ISR? If not, how can it know it in regular multithreaded applications? Thanks!

I think that in this case both volatile and atomic will most likely work in practice on the 32 bit ARM. At least in an older version of STM32 tools I saw that in fact the C atomics were implemented using volatile for small types. Volatile will work because the compiler may not optimize away any access to the variable that appears in the code. However, the generated code must differ for types that cannot be loaded in a single instruction. If you use a <code>volatile int64_t</code>, the compiler will happily load it in two separate instructions. If the ISR runs between loading the two halves of the variable, you will load half the old value and half the new value. Unfortunately using <code>atomic<int64_t></code> may also fail with interrupt service routines if the implementation is not lock free. For Cortex-M, 64-bit accesses are not necessarily lockfree, so atomic should not be relied on without checking the implementation. Depending on the implementation, the system might deadlock if the locking mechanism is not reentrant and the interrupt happens while the lock is held. Since C++17, this can be queried by checking <code>atomic<T>::is_always_lock_free</code>. A specific answer for a specific atomic variable (this may depend on alignment) may be obtained by checking <code>flagA.is_lock_free()</code> since C++11. So longer data must be protected by a separate mechanism (for example by turning off interrupts around the access and making the variable atomic or volatile. So the correct way is to use <code>std::atomic</code>, as long as the access is lock free. If you are concerned about performance, it may pay off to select the appropriate memory order and stick to values that can be loaded in a single instruction. Not using either would be wrong, the compiler will check the flag only once. These functions all wait for a flag, but they get translated differently: <pre class="prettyprint"><code>#include <atomic> #include <cstdint> using FlagT = std::int32_t; volatile FlagT flag = 0; void waitV() { while (!flag) {} } std::atomic<FlagT> flagA; void waitA() { while(!flagA) {} } void waitRelaxed() { while(!flagA.load(std::memory_order_relaxed)) {} } FlagT wrongFlag; void waitWrong() { while(!wrongFlag) {} } </code></pre> Using volatile you get a loop that reexamines the flag as you wanted: <pre class="prettyprint"><code>waitV(): ldr r2, .L5 .L2: ldr r3, [r2] cmp r3, #0 beq .L2 bx lr .L5: .word .LANCHOR0 </code></pre> Atomic with the default sequentially consistent access produces synchronized access: <pre class="prettyprint"><code>waitA(): push {r4, lr} .L8: bl __sync_synchronize ldr r3, .L11 ldr r4, [r3, #4] bl __sync_synchronize cmp r4, #0 beq .L8 pop {r4} pop {r0} bx r0 .L11: .word .LANCHOR0 </code></pre> If you do not care about the memory order you get a working loop just as with volatile: <pre class="prettyprint"><code>waitRelaxed(): ldr r2, .L17 .L14: ldr r3, [r2, #4] cmp r3, #0 beq .L14 bx lr .L17: .word .LANCHOR0 </code></pre> Using neither volatile nor atomic will bite you with optimization enabled, as the flag is only checked once: <pre class="prettyprint"><code>waitWrong(): ldr r3, .L24 ldr r3, [r3, #8] cmp r3, #0 bne .L23 .L22: // infinite loop! b .L22 .L23: bx lr .L24: .word .LANCHOR0 flag: flagA: wrongFlag: </code></pre>

Should volatile still be used for sharing data with ISRs in modern C++?

Tags:

c++

embedded

volatile

isr

I've seen some flavors of these question around and I've seen mixed answers, still unsure whether they are up-to-date and fully apply to my use case, so I'll ask here. Do let me know if it's a duplicate!

Given that I'm developing for STM32 microcontrollers (bare-metal) using C++17 and the gcc-arm-none-eabi-9 toolchain:

Do I still need to use volatile for sharing data between an ISR and main()?

volatile std::int32_t flag = 0;

extern "C" void ISR()
{
    flag = 1;
}

int main()
{
    while (!flag) { ... }
}

It's clear to me that I should always use volatile for accessing memory-mapped HW registers.

However for the ISR use case I don't know if it can be considered a case of "multithreading" or not. In that case, people recommend using C++11's new threading features (e.g. std::atomic). I'm aware of the difference between volatile (don't optimize) and atomic (safe access), so the answers suggesting std::atomic confuse me here.

For the case of "real" multithreading on x86 systems I haven't seen the need to use volatile.

In other words: can the compiler know that flag can change inside ISR? If not, how can it know it in regular multithreaded applications?

Thanks!

650

asked Aug 18 '20 15:08

user1011113

1 Answers

I think that in this case both volatile and atomic will most likely work in practice on the 32 bit ARM. At least in an older version of STM32 tools I saw that in fact the C atomics were implemented using volatile for small types.

Volatile will work because the compiler may not optimize away any access to the variable that appears in the code.

However, the generated code must differ for types that cannot be loaded in a single instruction. If you use a volatile int64_t, the compiler will happily load it in two separate instructions. If the ISR runs between loading the two halves of the variable, you will load half the old value and half the new value.

Unfortunately using atomic<int64_t> may also fail with interrupt service routines if the implementation is not lock free. For Cortex-M, 64-bit accesses are not necessarily lockfree, so atomic should not be relied on without checking the implementation. Depending on the implementation, the system might deadlock if the locking mechanism is not reentrant and the interrupt happens while the lock is held. Since C++17, this can be queried by checking atomic<T>::is_always_lock_free. A specific answer for a specific atomic variable (this may depend on alignment) may be obtained by checking flagA.is_lock_free() since C++11.

So longer data must be protected by a separate mechanism (for example by turning off interrupts around the access and making the variable atomic or volatile.

So the correct way is to use std::atomic, as long as the access is lock free. If you are concerned about performance, it may pay off to select the appropriate memory order and stick to values that can be loaded in a single instruction.

Not using either would be wrong, the compiler will check the flag only once.

These functions all wait for a flag, but they get translated differently:

#include <atomic>
#include <cstdint>

using FlagT = std::int32_t;

volatile FlagT flag = 0;
void waitV()
{
    while (!flag) {}
}

std::atomic<FlagT> flagA;
void waitA()
{
    while(!flagA) {}    
}

void waitRelaxed()
{
    while(!flagA.load(std::memory_order_relaxed)) {}    
}

FlagT wrongFlag;
void waitWrong()
{
    while(!wrongFlag) {}
}

Using volatile you get a loop that reexamines the flag as you wanted:

waitV():
        ldr     r2, .L5
.L2:
        ldr     r3, [r2]
        cmp     r3, #0
        beq     .L2
        bx      lr
.L5:
        .word   .LANCHOR0

Atomic with the default sequentially consistent access produces synchronized access:

waitA():
        push    {r4, lr}
.L8:
        bl      __sync_synchronize
        ldr     r3, .L11
        ldr     r4, [r3, #4]
        bl      __sync_synchronize
        cmp     r4, #0
        beq     .L8
        pop     {r4}
        pop     {r0}
        bx      r0
.L11:
        .word   .LANCHOR0

If you do not care about the memory order you get a working loop just as with volatile:

waitRelaxed():
        ldr     r2, .L17
.L14:
        ldr     r3, [r2, #4]
        cmp     r3, #0
        beq     .L14
        bx      lr
.L17:
        .word   .LANCHOR0

Using neither volatile nor atomic will bite you with optimization enabled, as the flag is only checked once:

waitWrong():
        ldr     r3, .L24
        ldr     r3, [r3, #8]
        cmp     r3, #0
        bne     .L23
.L22:                        // infinite loop!
        b       .L22
.L23:
        bx      lr
.L24:
        .word   .LANCHOR0
flag:
flagA:
wrongFlag:

answered Sep 19 '22 20:09

PaulR

Related questions
                            
                                C++ noexcept for a function not throwing exceptions, but can cause a memory failure
                            
                                Fastest way to compute the cdf of the Normal distribution over vectors - R::pnorm vs erfc vs?
                            
                                C++ : Ternary Operator (Conditional Operator) and its Implicit Type Conversion Rules
                            
                                static openCL class not properly released in python module using boost.python
                            
                                When does it matter that `this` is an rvalue?
                            
                                Proxy objects in iterators
                            
                                extern "C" static void* function
                            
                                Take ownership of parameter by rvalue-reference
                            
                                Why does llvm::SmallVector split its storage?
                            
                                Making a C++ module part of a Python package
                            
                                Why is passing by const ref slower when using std::async
                            
                                Can I use the result of a C++17 captureless lambda constexpr conversion operator as a function pointer template non-type argument?
                            
                                size of size_t compared to unsigned int
                            
                                auto with parentheses and initialiser list
                            
                                c++ safeness of code with implicit conversion between signed and unsigned
                            
                                Why is it not possible to construct a `std::filesystem::path` from `std::filesystem::path` iterators?
                            
                                Bug throwing exceptions with std::call_once
                            
                                Explain integer comparison with promotion
                            
                                Fast method to multiply integer by proper fraction without floats or overflow
                            
                                How can a const expr be evaluated so fast

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With