I am trying to understand if the removal of local intermediate variables could lead to better optimized code. Consider the following MWE, paying particular attention to the two functions f
and g
:
struct A {
double d;
};
struct B {
double s;
};
struct C {
A a;
B b;
};
A geta();
B getb();
C f() {
const A a = geta();
const B b = getb();
C c;
c.a = a;
c.b = b;
return c;
}
C g() {
C c;
c.a = geta();
c.b = getb();
return c;
}
Both f
and g
call geta()
and getb()
to populate an instance of class C
which is then returned, but f
uses two local intermediate variables to store the returned values of geta()
and getb()
, while g
directly assigns the returned values to the members of c
.
Compiling with gcc -O3
, version 9.2, the binaries for the two functions f
and g
are exactly the same. However, adding another variable to either A
or B
class leads to different binaries. In particular, the binary for f
has some more instructions. The same holds for clang v8.0.0 with -O3
flag.
What is happening here? Why is the compiler not able to optimize away the local intermediate variables of f
when A
or B
get a little more complex? Isn't the code of f
and g
equivalent?
In addition, the behavior is not the same for MSVC v19.22 with /O2
flag: the compiler from Microsoft already has different binaries in the first case, i.e. with both classes A
and B
composed by a single double
.
I am using Godbolt: you can find here the code which produces different binaries.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
Compiler specific pragma gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0") , the optimization level can be set to zero, which means absolutely no optimize for gcc.
Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed. I would suggestion starting at the Compiler optimization wikipedia page as there are many different kinds of optimization that are performed at many different stages.
Which property is the most important for an optimizing compiler? Layers in the cache hierarchy that are closer to the CPU are than layers that are farther from the CPU.
This is a missed optimization
Neither function takes the address of C c
so escape analysis should easily prove it's a pure local that nothing else could have a pointer to. geta()
and getb()
can't be reading or writing that variable directly, therefore it's safe to store the geta()
return value directly into c.a
instead of a temporary on the stack.
Surprisingly GCC, clang, ICC, and MSVC all miss this optimization, most using call-preserved registers to hold the geta()
return value until after getb()
. https://godbolt.org/z/WQ9MAF At least for x86-64; I mostly didn't check other ISAs or older compiler versions.
Fun fact: clang 3.5 has this missed-optimization even for g()
, defeating the source code's attempt to be efficient.
Fun fact #2: With GCC9.2, compiling as C instead of C++ makes GCC do a much worse job, deoptimizing g()
. (I had to change to typedef struct Atag {...} A;
but compiling that as C++ still optimizes g()
. https://godbolt.org/z/_Y95nj)
clang8.0 produces an efficient g()
with/without -xc
. and ICC produces an inefficient g()
either way.
ICC's f()
is even worse than its g()
.
MSVC's g()
is about efficient as you could hope for; the Windows x64 calling convention returns the struct by hidden pointer and MSVC never optimizes that to passing a pointer to its own return-value object. (Which it probably couldn't prove is safe anyway, if its own caller was also potentially doing such optimizations.)
Obviously if geta()
and getb()
can inline, that removes any doubt for the optimizer and it should do the optimization more easily / reliably.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With