According to most benchmarks, Intel's Clear Linux is way faster than other distributions, mostly thanks to a GCC feature called Function Multi-Versioning. Right now the method they use is to compile the code, analyze which function contains vectorized loops, then patch the code with FMV attributes and compile it again.
How feasible will it be for GCC to do it automatically? For example, by passing -mmultiarch=sandybridge,skylake
(or a similar -m option listing CPU extensions like AVX and AVX2).
Right now I'm interested in two usage scenarios:
No, but it doesn't matter. There's very, very little code that will actually benefit from this; for the most part by doing it globally you'll just (without special effort to sort matching versions in pages together) make your system much more memory-constrained and slower due to the huge increase in code size. Most actual loads aren't even CPU-bound; they're syscall-overhead-bound, GPU-bound, IO-bound, etc. And many of the modern ones that are CPU-bound aren't running precompiled code but JIT'd code (i.e. everything running in a browser, whether that's your real browser or the outdated and unpatched fork of Chrome in every Electron app).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With