With most C/C++ compilers, there's a flag passable to the compiler, -march=native
, which tells the compiler to tune generated code for the micro-architecture and ISA extensions of the host CPU. Even if it doesn't go by the same name, there's typically an equivalent option for LLVM-based compilers, like rustc
or swiftc
.
In my own experience, this flag can provide massive speedups for numerically-intensive code, and it sounds like it would be free of compromises for code you're just compiling for your own machine. That said, I don't think I've seen any build system or static compiler that enables it by default:
Obviously, any command-line compiler executable that requires you to pass it doesn't use it by default.
I can't think of any IDE that enables this by default.
I can't think of any common build system I've worked with (cmake
, automake
, cargo
, spm
, etc.) that enables it by default, even for optimized builds.
I can think of a few reasons for this, but none of them are really satisfactory:
Using -march=native
is inappropriate for binaries that will be distributed to other machines. That said, I find myself compiling sources for my own machine much more often than for others, and this doesn't explain its lack of use in debug builds, where there's no intention for distribution.
At least on Intel x86 CPUs, it's my understanding that using AVX instructions infrequently could degrade performance or power efficiency, since the AVX unit is powered down when not in use, requiring it to be powered up to be used, and a lot of Intel CPUs downclock to run AVX instructions. Still, it only explains why AVX wouldn't be enabled, not why the code wouldn't be tuned for the particular micro-architecture's handling of regular instructions.
Since most x86 CPUs use fancy out-of-order superscalar pipelines with register renaming, tuning code for a particular micro-architecture probably isn't particularly important. Still, if it could help, why not use it?
Rarely definition Rarely is defined as infrequently or exceptionally. An example of rarely used as an adverb is in the sentence, "He rarely goes for runs any more," which means "He usually does not go for runs any more." adverb.
Frequency adverbs meaning 'not very often'Hardly ever, rarely, scarcely and seldom are frequency adverbs. We can use them to refer to things that almost never happen, or do not happen very often. They have a negative meaning. We use them without not.
She rarely talks about her past. Rarely do we see this kind of weather in our area. Only rarely is surgery necessary to treat this condition.
If you take a closer look at the defaults of gcc, the oldest compiler in your list, you'll realize that they are very conservative:
-Wall
and -Wextra
has not changed for years; there are new useful warnings, they are NOT added to -Wall
or -Wextra
.Why? Because it would break things!
There are entire development chains relying on those convenience defaults, and any alteration brings the risk of either breaking them, or of producing binaries that will not run on the targets.
The more users, the greater the threat, so developers of gcc are very, very conservative to avoid world-wide breakage. And developers of the next batch of compilers follow in the footsteps of their elders: it's proven to work.
Note: rustc
will default to static linking, and boasts that you can just copy the binary and drop it on another machine; obviously -march=native
would be an impediment there.
And in the truth is, it probably doesn't matter. You actually recognized it yourself:
In my own experience, this flag can provide massive speedups for numerically-intensive code
Most code is full of virtual calls and branches (typically OO code) and not at all numerically-intensive. Thus, for the majority of the code, SSE 2 is often sufficient.
The few codebases for which performance really matters will require significant time invested in performance tuning anyway, both at code and compiler level. And if vectorization matters, it won't be left at the whim of the compiler: developers will use the built-in intrinsics and write the vectorized code themselves, as it's cheaper than putting up a monitoring tool to ensure that auto-vectorization did happen.
Also, even for numerically intensive code, the host machine and the target machine might differ slightly. Compilation benefits from lots of core, even at a lower frequency, while execution benefits from a high frequency and possibly less cores unless the work is easily parallelizable.
Not activating -march=native
by default makes it easier for users to get started; since even performance seekers may not care for it much, this means there's more to lose than gain.
In an alternative history where the default had been -march=native
from the beginning; users would be used to specify the target architecture, and we would not be having this discussion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With