I have noticed that the gcc flag -ftree-vectorize
is very useful for optimizing code.
I am trying to understand better how it works, but the doc is fairly concise:
Perform vectorization on trees. This flag enables -ftree-loop-vectorize and -ftree-slp-vectorize if not explicitly specified.
Does anyone know the inner workings of this flag?
Modern versions of GCC enable -ftree-vectorize at -O3 so just use that in GCC4.x and later: (Clang enables auto-vectorization at -O2. ICC defaults to optimization enabled + fast-math.) Most of the following was written by Peter Cordes, who could have just written a new answer. Over time, as compilers change, options and compiler output will change.
Although the behavior is similar to the Gold Linker’s ICF optimization, GCC ICF works on different levels and thus the optimizations are not same - there are equivalences that are found only by GCC and equivalences found only by Gold. This flag is enabled by default at -O2 and -Os .
Perform basic block vectorization on trees. This flag is enabled by default at -O3 and by -ftree-vectorize, -fprofile-use , and -fauto-profile . Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use.
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. The compiler performs optimization based on the knowledge it has of the program.
Trees are an internal code representation used by GCC, and tree vectorization happens in this stage. In this representation, it's fairly easy to spot repeated instructions. If the code generator can emit SIMD instructions, it helps to bundle these repeated instructions already in the tree stage.
See tree-vectorizer.c for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With