Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does "volatile" guarantee anything at all in portable C code for multi-core systems?

After looking at a bunch of other questions and their answers, I get the impression that there is no widespread agreement on what the "volatile" keyword in C means exactly.

Even the standard itself does not seem to be clear enough for everyone to agree on what it means.

Among other problems:

  1. It seems to provide different guarantees depending on your hardware and depending on your compiler.
  2. It affects compiler optimizations but not hardware optimizations, so on an advanced processor that does its own run-time optimizations, it is not even clear whether the compiler can prevent whatever optimization you want to prevent. (Some compilers do generate instructions to prevent some hardware optimizations on some systems, but this does not appear to be standardized in any way.)

To summarize the problem, it appears (after reading a lot) that "volatile" guarantees something like: The value will be read/written not just from/to a register, but at least to the core's L1 cache, in the same order that the reads/writes appear in the code. But this seems useless, since reading/writing from/to a register is already sufficient within the same thread, while coordinating with L1 cache doesn't guarantee anything further regarding coordination with other threads. I can't imagine when it could ever be important to sync just with L1 cache.

USE 1
The only widely-agreed-upon use of volatile seems to be for old or embedded systems where certain memory locations are hardware-mapped to I/O functions, like a bit in memory that controls (directly, in the hardware) a light, or a bit in memory that tells you whether a keyboard key is down or not (because it is connected by the hardware directly to the key).

It seems that "use 1" does not occur in portable code whose targets include multi-core systems.

USE 2
Not too different from "use 1" is memory that could be read or written at any time by an interrupt handler (which might control a light or store info from a key). But already for this we have the problem that depending on the system, the interrupt handler might run on a different core with its own memory cache, and "volatile" does not guarantee cache coherency on all systems.

So "use 2" seems to be beyond what "volatile" can deliver.

USE 3
The only other undisputed use I see is to prevent mis-optimization of accesses via different variables pointing to the same memory that the compiler doesn't realize is the same memory. But this is probably only undisputed because people aren't talking about it -- I only saw one mention of it. And I thought the C standard already recognized that "different" pointers (like different args to a function) might point to the same item or nearby items, and already specified that the compiler must produce code that works even in such cases. However, I couldn't quickly find this topic in the latest (500 page!) standard.

So "use 3" maybe doesn't exist at all?

Hence my question:

Does "volatile" guarantee anything at all in portable C code for multi-core systems?


EDIT -- update

After browsing the latest standard, it is looking like the answer is at least a very limited yes:
1. The standard repeatedly specifies special treatment for the specific type "volatile sig_atomic_t". However the standard also says that use of the signal function in a multi-threaded program results in undefined behavior. So this use case seems limited to communication between a single-threaded program and its signal handler.
2. The standard also specifies a clear meaning for "volatile" in relation to setjmp/longjmp. (Example code where it matters is given in other questions and answers.)

So the more precise question becomes:
Does "volatile" guarantee anything at all in portable C code for multi-core systems, apart from (1) allowing a single-threaded program to receive information from its signal handler, or (2) allowing setjmp code to see variables modified between setjmp and longjmp?

This is still a yes/no question.

If "yes", it would be great if you could show an example of bug-free portable code which becomes buggy if "volatile" is omitted. If "no", then I suppose a compiler is free to ignore "volatile" outside of these two very specific cases, for multi-core targets.

like image 976
Matt Avatar asked Nov 04 '19 14:11

Matt


People also ask

What is the purpose of volatile in C?

The volatile keyword is intended to prevent the compiler from applying any optimizations on objects that can change in ways that cannot be determined by the compiler. Objects declared as volatile are omitted from optimization because their values can be changed by code outside the scope of current code at any time.

What is volatile in C and why would you use it when you communicate from software to hardware?

A volatile keyword in C is nothing but a qualifier that is used by the programmer when they declare a variable in source code. It is used to inform the compiler that the variable value can be changed any time without any task given by the source code. Volatile is usually applied to a variable when we are declaring it.

How does volatile affect code optimization by compiler?

Use the volatile keyword when declaring variables that the compiler must not optimize. If you do not use the volatile keyword where it is needed, then the compiler might optimize accesses to the variable and generate unintended code or remove intended functionality.

Where volatile variables are stored in C?

There's no reason for a volatile variable to be stored in any "special" section of memory. It is normally stored together with any other variables, including non-volatile ones. If some compiler decides to store volatile variables in some special section of memory - there's nothing to prevent it from doing so.

What is volatile in C?

Introduction to Volatile in C. A volatile keyword in C is nothing but a qualifier that is used by the programmer when they declare a variable in source code. It is used to inform the compiler that the variable value can be changed any time without any task given by the source code. Volatile is usually applied to a variable when we are declaring it.

Is the C++11 ISO standard volatile keyword different in MSVC?

If you are familiar with the C# volatile keyword, or familiar with the behavior of volatile in earlier versions of the Microsoft C++ compiler (MSVC), be aware that the C++11 ISO Standard volatile keyword is different and is supported in MSVC when the /volatile:iso compiler option is specified. (For ARM, it's specified by default).

What is the use of the /volatile compiler option?

When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects. In particular:

How to implement volatile keyword in C?

Examples to Implement Volatile in C. Here is the sample code to demonstrate the working of volatile keyword: Example #1. Without using keyword Volatile: Code: #include<stdio.h> // C header file for standard input and output int a = 0 ; // initilaizing and declaring the integer a to value 0. int main // main class


4 Answers

First of all, there's historically been various hiccups regarding different intepretations of the meaning of volatile access and similar. See this study: Volatiles Are Miscompiled, and What to Do about It.

Apart from the various issues mentioned in that study, the behavior of volatile is portable, save for one aspect of them: when they act as memory barriers. A memory barrier is some mechanism which is there to prevent concurrent unsequenced execution of your code. Using volatile as a memory barrier is certainly not portable.

Whether the C language guarantees memory behavior or not from volatile is apparently arguable, though personally I think the language is clear. First we have the formal definition of side effects, C17 5.1.2.3:

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

The standard defines the term sequencing, as a way of determining order of evaluation (execution). The definition is formal and cumbersome:

Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread, which induces a partial order among those evaluations. Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. (Conversely, if A is sequenced before B, then B is sequenced after A.) If A is not sequenced before or after B, then A and B are unsequenced. Evaluations A and B are indeterminately sequenced when A is sequenced either before or after B, but it is unspecified which.13) The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B. (A summary of the sequence points is given in annex C.)

The TL;DR of the above is basically that in case we have an expression A which contains side-effects, it must be done executing before another expression B, in case B is sequenced after A.

Optimizations of C code are made possible through this part:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

This means that the program may evaluate (execute) expressions in the order that the standard mandates elsewhere (order of evaluation etc). But it need not evaluate (execute) a value if it can deduce that it is not used. For example, the operation 0 * x doesn't need to evaluate x and simply replace the expression with 0.

Unless accessing a variable is a side-effect. Meaning that in case x is volatile, it must evaluate (execute) 0 * x even though the result will always be 0. Optimization is not allowed.

Furthermore, the standard speaks of observable behavior:

The least requirements on a conforming implementation are:

  • Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
    /--/ This is the observable behavior of the program.

Given all of the above, a conforming implementation (compiler + underlying system) may not execute the access of volatile objects in an unsequenced order, in case the semantics of the written C source says otherwise.

This means that in this example

volatile int x;
volatile int y;
z = x;
z = y;

Both assignment expressions must be evaluated and z = x; must be evaluated before z = y;. A multi-processor implementation that outsource these two operations to two different unsequenced cores is not conforming!

The dilemma is that compilers can't do much about things like pre-fetch caching and instruction pipelining etc, particularly not when running on top of an OS. And so compilers hand that problem over to the programmers, telling them that memory barriers is now the programmer's responsibility. While the C standard clearly states that the problem needs to be solved by the compiler.

The compiler doesn't necessarily care to solve the problem though, and so volatile for the sake of acting as a memory barrier is non-portable. It has become a quality of implementation issue.

like image 112
Lundin Avatar answered Oct 07 '22 16:10

Lundin


The ISO C standard, no, but in practice all machines that we run threads across have coherent shared memory, so volatile in practice works somewhat like _Atomic with memory_order_relaxed, at least for pure-load / pure-store operations on small-enough types. (But of course only _Atomic will give you atomic RMWs for stuff like n += 1;)

There's also the question of what exactly volatile means to a compiler. The standard allows wiggle room, but in real-world compilers, it means the load or store has to actually happen in the asm. No more, no less. (A compiler that didn't work this way couldn't correctly compile pre-C11 multi-threaded code that used hand-rolled volatile, so that de-facto standard is a requirement for compilers to be generally useful and for anyone to want to actually use them. ISO C leaves enough choice up to the implementation that a DeathStation 9000 could be ISO C compliant and almost totally unusable for real programs, and break most real code bases.)

The requirement that volatile accesses are guaranteed to happen in source order is normally interpreted as putting the asm in that order, leaving runtime reordering at the mercy of the target machine's memory model. volatile accesses aren't ordered wrt. anything else, so plain operations can still optimize away separately from them.


When to use volatile with multi threading? is a C++ version of the question. Answer: basically never, use stdatomic. My answer there explains why cache-coherency makes volatile useful in practice: there are no C or C++ implementations I'm aware of where shared_var.store(1, std::memory_order_relaxed) needs to explicitly flush anything to make the store visible to other cores. It compiles to just a normal asm store instruction, for variables narrow enough to be "naturally" atomic.

(Memory barriers just make this core wait, e.g. until the store commits from the store buffer to L1d cache and thus becomes globally visible, before doing later loads/stores. So they order this core's accesses to coherent shared memory.)

For example, the Linux kernel depends on this, using volatile for inter-thread visibility, and asm() for memory barriers to order those accesses, and for atomic-RMW operations. All multi-core systems that can run a single instance of Linux across those cores have coherent shared memory.

There are some rare systems with shared memory that isn't coherent, for example some clusters. But you don't run threads of the same process across different coherency domains. (Or run a single instance of the OS on it). Instead, the shared memory has to get mapped differently from normal write-back cacheable, or you have to do explicit flushing.

like image 27
Peter Cordes Avatar answered Oct 07 '22 15:10

Peter Cordes


I'm no expert, but cppreference.com has what appears to me to be some pretty good information on volatile. Here's the gist of it:

Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.

It also gives some uses:

Uses of volatile

1) static volatile objects model memory-mapped I/O ports, and static const volatile objects model memory-mapped input ports, such as a real-time clock

2) static volatile objects of type sig_atomic_t are used for communication with signal handlers.

3) volatile variables that are local to a function that contains an invocation of the setjmp macro are the only local variables guaranteed to retain their values after longjmp returns.

4) In addition, volatile variables can be used to disable certain forms of optimization, e.g. to disable dead store elimination or constant folding for microbenchmarks.

And of course, it mentions that volatile is not useful for thread synchronization:

Note that volatile variables are not suitable for communication between threads; they do not offer atomicity, synchronization, or memory ordering. A read from a volatile variable that is modified by another thread without synchronization or concurrent modification from two unsynchronized threads is undefined behavior due to a data race.

like image 7
Fred Larson Avatar answered Oct 07 '22 17:10

Fred Larson


To summarize the problem, it appears (after reading a lot) that "volatile" guarantees something like: The value will be read/written not just from/to a register, but at least to the core's L1 cache, in the same order that the reads/writes appear in the code.

No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.

If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).

But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.

(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)

In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:

  • you may use ptrace (a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
  • while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
  • you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).

Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).

Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x can be simplified for a non volatile integer x but not of a volatile object.

So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)

Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.

The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long on the architecture is atomic, then it's expected that a read or write of a volatile long will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.

You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.

But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.

The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.

Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:

  • volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
  • volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).

Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.

The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.

Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.

like image 2
curiousguy Avatar answered Oct 07 '22 15:10

curiousguy