Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Statement Reordering

This is a question about Chandler's answer here (I didn't have a high enough rep to comment): Enforcing statement order in C++

In his answer, suppose foo() has no input or output. It's a black box that does work that is observable eventually, but won't be needed immediately (e.g. executes some callback). So we don't have input/output data locally handy to tell the compiler not to optimize. But I know that foo() will modify the memory somewhere, and the result will be observable eventually. Will the following prevent statement reordering and get the correct timing in this case?

#include <chrono>
#include <iostream>

//I believe this tells the compiler that all memory everywhere will be clobbered?
//(from his cppcon talk: https://youtu.be/nXaxk27zwlk?t=2441)
__attribute__((always_inline)) inline void DoNotOptimize() {
  asm volatile("" : : : "memory");
}

// The compiler has full knowledge of the implementation.
static int ugly_global = 1; //we print this to screen sometime later
static void foo(void) { ugly_global *= 2; }

auto time_foo() {
  using Clock = std::chrono::high_resolution_clock;

  auto t1 = Clock::now();         // Statement 1
  DoNotOptimize();
  foo();                          // Statement 2
  DoNotOptimize();
  auto t2 = Clock::now();         // Statement 3

  return t2 - t1;
}
like image 281
P. Mattione Avatar asked Feb 14 '20 18:02

P. Mattione


People also ask

How to prevent compiler reordering?

To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory"); for this purpose. This is different from CPU reordering, a.k.a. the memory-ordering model.

Why compiler reorder instructions?

Compiler and hardware try to reorder programs in order to improve their efficiency, while respecting dependencies. Indeed their actions is complementary. Compiler can consider larger reorganisations than processor and uses more complex heuristics to do it.

Can GCC reorder function calls?

GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that. In practice (with optimizations enabled: try compiling foo. c with gcc -Wall -fverbose-asm -O3 foo.

Can compiler reorder function calls?

If you mean that the difference can not be observed, then yes, the compiler (and even the CPU itself) is free to reorder the operations.


1 Answers

Will the following prevent statement reordering and get the correct timing in this case?

It should not be necessary because the calls to Clock::now should, at the language-definition level, enforce enough ordering. (That is, the C++11 standard says that the high resolution clock ought to get as much information as the system can give here, in the way that is most useful here. See "secondary question" below.)

But there is a more general case. It's worth thinking about the question: How does whoever provides the C++ library implementation actually write this function? Or, take C++ itself out of the equation. Given a language standard, how does an implementor—a person or group writing an implementation of that language—get you what you need? Fundamentally, we need to make a distinction between what the language standard requires and how an implementation provider goes about implementing the requirements.

The language itself may be expressed in terms of an abstract machine, and the C and C++ languages are. This abstract machine is pretty loosely defined: it executes some kind of instructions, which access data, but in many cases we don't know how it does these things, or even how big the various data items are (with some exceptions for fixed-size integers like int64_t), and os on. The machine may or may not have "registers" that hold things in ways that cannot be addressed as well as memory that can be addressed and whose addresses can be recorded in pointers:

p = &var

makes the value store in p (in memory or a register) such that using *p accesses the value stored in var (in memory or a register—some machines, especially back in the olden days, have / had addressable registers).1

Nonetheless, despite all of this abstraction, we want to run real code on real machines. Real machines have real constraints: some instructions might require particular values in particular registers (think about all the bizarre stuff in the x86 instruction sets, or wide-result integer multipliers and dividers that use special-purpose registers, as on some MIPS processors), or cause CPU sychronizations, or whatever.

GCC in particular invented a system of constraints to express what you could or could not do on the machine itself, using the machine's instruction set. Over time, this evolved into user-accessible asm constructs with input, output, and clobber sections. The particular one you show:

__attribute__((always_inline)) inline void DoNotOptimize() {
  asm volatile("" : : : "memory");
}

expresses the idea that "this instruction" (asm; the actual provided instruction is blank) "cannot be moved" (volatile) "and clobbers all of the computer's memory, but no registers" ("memory" as the clobber section).

This is not part of either C or C++ as a language. It's just a compiler construction, supported by GCC and now supported by clang as well. But it suffices to force the compiler to issue all stores-to-memory before the asm, and reload values from memory as needed after the asm, in case they changed when the computer executed the (nonexistent) instruction included in the asm line. There's no guarantee that this will work, or even compile at all, in some other compiler, but as long as we're the implementor, we choose the compiler we're implementing for/with.

C++ as a language now has support for ordered memory operations, which an implementor must implement. The implementor can use these asm volatile constructs to achieve the right result, provided they do actually achieve the right result. For instance, if we need to cause the machine itself to synchronize—to emit a memory barrier—we can stick the appropriate machine instruction, such as mfence or membar #sync or whatever it may be, in the asm's instruction-section clause. See also compiler reordering vs memory reordering as Klaus mentioned in a comment.

It is up to the implementor to find an appropriately effective trick, compiler-specific or not, to get the right semantics while minimizing any runtime slowdown: for instance, we might want to use lfence rather than mfence if that's sufficient, or membar #LoadLoad, or whatever the right thing is for the machine. If our implementation of Clock::now requires some sort of fancy inline asm, we write one. If not, we don't. We make sure that we produce what's required—and then all users of the system can just use it, without needing to know what sort of grubby implementation tricks we had to invoke.

There's a secondary question here: does the language specification really constrain the implementor the way we think/hope it does? Chris Dodd's comment says he thinks so, and he's usually right on these kinds of questions. A couple of other commenters think otherwise, but I'm with Chris Dodd on this one. I think it is not necessary. You can always compile to assembly, or disassemble the compiled program, to check, though!

If the compiler didn't do the right thing, that asm would force it to do the right thing, in GCC and clang. It probably wouldn't work in other compilers.


1On the KA-10 in particular, the registers were just the first sixteen words of memory. As the Wikipedia page notes, this meant you could put instructions into there and call them. Because the first 16 words were the registers, these instructions ran much faster than other instructions.

like image 163
torek Avatar answered Sep 30 '22 07:09

torek