I recently came across this brilliant cpp2015 talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"
One of the techniques mentioned to prevent the compiler from optimizing code is using the below functions.
static void escape(void *p) {
asm volatile("" : : "g"(p) : "memory");
}
static void clobber() {
asm volatile("" : : : "memory");
}
void benchmark()
{
vector<int> v;
v.reserve(1);
escape(v.data());
v.push_back(10);
clobber()
}
I'm trying to understand this. Questions as follows.
1) What is the advantage of an escape over clobber ?
2) From the example above it looks like clobber() prevents the previous statement ( push_back ) to be optimized way. If that's the case why the below snippet is not correct ?
void benchmark()
{
vector<int> v;
v.reserve(1);
v.push_back(10);
clobber()
}
If this wasn't confusing enough, folly ( FB's threading lib ) has an even stranger implementation
Relevant snippet:
template <class T>
void doNotOptimizeAway(T&& datum) {
asm volatile("" : "+r" (datum));
}
My understanding is that the above snippet informs the compiler that the assembly block will writes to datum. But if the compiler finds there is no consumer of this datum it can still optimize out the entity producing datum right ?
I assume this is not common knowledge and any help is appreciated !
Use -O0 to disable them and use -S to output assembly. -O3 is the highest level of optimization. Starting with gcc 4.8 the optimization level -Og is available. It enables optimizations that do not interfere with debugging and is the recommended default for the standard edit-compile-debug cycle.
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption (the last three being popular for portable computers).
Which property is the most important for an optimizing compiler? Layers in the cache hierarchy that are closer to the CPU are than layers that are farther from the CPU.
Optimizing compilers are a mainstay of modern software: allowing a programmer to write code in a language that makes sense to them, while transforming it into a form that makes sense for the underlying hardware to run efficiently.
tl;dr doNotOptimizeAway
creates an artificial "use"s.
A little bit of terminology here: a "def" ("definition") is a statement, which assigns a value to a variable; a "use" is a statement, which uses the value of a variable to perform some operation.
If from the point immediately after a def, all the paths to the program exit do not encounter a use of a variable, that def is called dead
and Dead Code Elimination (DCE) pass will remove it. Which in turn may cause other defs to become dead (if that def was an use by virtue of having variable operands), etc.
Imagine the program after Scalar Replacement of Aggregates (SRA) pass, which turns the local std::vector
in two variables len
and ptr
. At some point the program assigns a value to ptr
; that statement is a def.
Now, the original program didn't do anything with the vector; in other words there weren't any uses of either len
or ptr
. Hence, all of their defs are dead and the DCE can remove them, effectively removing all code and making the benchmark worthless.
Adding doNotOptimizeAway(ptr)
creates an artificial use, which prevents DCE from removing the defs. (As a side note, I see no point in the "+", "g" should have been enough).
A similar line of reasoning can be followed with memory loads and stores: a store (a def) is dead iff there is no path to the end of the program, which contains load (a use) from that store location. As tracking arbitrary memory locations is a lot harder than tracking individual pseudo-register variables, the compiler reasons conservatively - a store is dead if there is no path to the end of the program, which could possibly encounter a use of that store.
One such case, is a store to a region of memory, which is guaranteed to not be aliased - after that memory is deallocated, there could not possibly be a use of that store, which does not trigger undefined behaviour. IOW, there are no such uses.
Thus a compiler could eliminate v.push_back(42)
. But there comes escape
- it causes the v.data()
to be considered as arbitrarily aliased, as @Leon described above.
The purpose of clobber()
in the example is to create an artificial use of all of the aliased memory. We have a store (from push_back(42)
), the store is to a location that is globally aliased (due to the escape(v.data())
), hence clobber()
could potentially contain a use of that store (IOW, the store side effect to be observable), therefore the compiler is not allowed to remove the store.
A few simpler examples:
Example I:
void f() {
int v[1];
v[0] = 42;
}
This does not generate any code.
Example II:
extern void g();
void f() {
int v[1];
v[0] = 42;
g();
}
This generates just a call to g()
, no memory store. The function g
cannot possibly access v
because v
is not aliased.
Example III:
void clobber() {
__asm__ __volatile__ ("" : : : "memory");
}
void f() {
int v[1];
v[0] = 42;
clobber();
}
Like in the previous example, no store generated because v
is not aliased and the call to clobber
is inlined to nothing.
Example IV:
template<typename T>
void use(T &&t) {
__asm__ __volatile__ ("" :: "g" (t));
}
void f() {
int v[1];
use(v);
v[0] = 42;
}
This time v
escapes (i.e. can be potentially accessed from other activation frames). However, the store is still removed, since after it there were no potential uses of that memory (without UB).
Example V:
template<typename T>
void use(T &&t) {
__asm__ __volatile__ ("" :: "g" (t));
}
extern void g();
void f() {
int v[1];
use(v);
v[0] = 42;
g(); // same with clobber()
}
And finally we get the store, because v
escapes and the compiler must conservatively assume that the call to g
may access the stored value.
(for experiments https://godbolt.org/g/rFviMI)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With