Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between -fprofile-use and -fauto-profile?

Tags:

c++

gcc

What is the difference between -fprofile-use and -fauto-profile?

Here's what the docs say:

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

-fprofile-use

-fprofile-use=path

Enable profile feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: [...]

If path is specified, GCC looks at the path to find the profile feedback data files. See -fprofile-dir.

and underneath that

-fauto-profile

-fauto-profile=path

Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: [...]

path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.

(The list of optimizations in the [...] for -fauto-profile is longer.)

like image 568
Praxeolitic Avatar asked Apr 17 '15 13:04

Praxeolitic


People also ask

What is the difference between single and double quotation marks?

General Usage Rules In America, Canada, Australia and New Zealand, the general rule is that double quotes are used to denote direct speech. Single quotes are used to enclose a quote within a quote, a quote within a headline, or a title within a quote.

What's the difference between apostrophe and quotation marks?

But I bet you're curious about how they're different, why else would you be here? The main difference between the two is: Quotation marks are used to report speech. An apostrophe is used for making contractions and possession.

What is the difference between '/' and '%' operator?

Solution. The / operator is used for division whereas % operator is used to find the remainder.

What is the difference between single and double inverted commas?

Double quotation marks (in British English) are used to indicate direct speech within direct speech (use single inverted commas for direct speech and double quotation marks to enclose quoted material within).


2 Answers

I stumbled into this thread by a path I can't even remember and am learning this stuff as I go along. But I don't like seeing an unanswered question if I could learn something from it! So I got reading.

Feedback-Directed Optimisation

As GCC say, both of these are modes of applying Feedback-Directed Optimisation. By running the program and profiling what it does, how it does it, how long it spends in which functions, etc. - we may facilitate extra, directed optimisations from the resulting data. Results from the profiler are 'fed forward' to the optimiser. Next, presumably, you can take your profile-optimised binary and profile that, then compile another FDO'd version, and so on... hence the feedback part of the name.

The real answer, the difference between these two switches, isn't very clearly documented, but it's available if we just need to look a little further.

-fprofile-use

Firstly, your quote for -fprofile-use only really states that it requires -fprofile-generate, an option that isn't very well documented: the reference from -use just tells you to read the page you're already on, where in all cases, -generate is only mentioned but never defined. Useful! But! We can refer to the answers to this question: How to use profile guided optimizations in g++?

As that answer states, and the piece of GCC's documentation in question here gently indicates... -fprofile-generate causes instrumentation to be added to the output binary. As that page summarises, an instrumented executable has stuff added to facilitate extra checks or insights during its runtime.

(The other form of instrumentation I know - and the one I've used - is the compiler add-on library UBSan, which I use via GCC's -fsanitize=undefined option. This catches bits of Undefined Behaviour at runtime. GCC with this on has revealed UB I might've otherwise taken ages to find - and made me wonder how my programs ran at all! Clang can use this library too, and maybe other compilers.)

-fauto-profile

In contrast, -fauto-profile is different. The key distinction is hinted, if not clearly, in the synopsis you quoted for it:

path is the name of a file containing AutoFDO profile information.

This mode handles profiling and subsequent optimisations using AutoFDO. To Google we go: AutoFDO The first few lines don't explain this as succinctly as possible, and I think the best summary is buried rather far down the page:

The major difference between AutoFDO [-fauto-profile] and FDO [-fprofile-use] is that AutoFDO profiles on optimized binary instead of instrumented binary. This makes it very different in handling cloned functions.

How does it do this? -fauto-profile requires you to provide profiling files written out by the Linux kernel's profiler, Perf, converted to the AutoFDO format. Perf, rather than adding instrumentation, uses hardware features of the CPU and kernel-level features of the OS to profile various statistics about a program while it's running:

perf is powerful: it can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is capable of lightweight profiling. [...] Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots.

So, that lets it profile an optimised program, rather than an instrumented one. We might reasonably presume this is more representative of how your program would react in the real world - and so can facilitate gathering more useful profiling data and applying more effective optimisations as a result.

An example of how to do the legwork of tying all this together and getting -fauto-profile to do something with your program is summarised here: Feedback directed optimization with GCC and Perf

(Maybe now that I learned all this, I'll try these options out some day!)

like image 117
underscore_d Avatar answered Oct 16 '22 21:10

underscore_d


underscore_d gives an in-depth insight into the differences.

Here is my take on it.

Performing internal profiling by compiling initially with -fprofile-generate, which integrates the profiler into the binary for the performance data collection run. Execute the binary, for 10 minutes or whatever time you think covers enough activity for the profiler to record. Recompile again instead with -fprofile-use along with -fprofile-correction if it is a multi-threaded application. Internal profiler runs causes a significant performance hit (25% in my case) which does not reflect the real world non-profiler included binary behavior, so could result in less accurate profiling, but if all activity when running the profiler enabled binary scales with the performance penalty, I guess it should not matter.

Alternatively you can use the perf tool (more error prone and effort) which is specific to your kernel (may also need kernel built to support profiling, tracing etc) to create the profiling data. This could be considered, external profiling and has negligible impact on the application performance while being profiled. You run this on the binary that you compile normally. I cannot find any studies comparing the two.

perf record -e br_inst_retired:near_taken -b -o perf.data *your_program.unstripped -program -parameters*

then without stripping the binary, convert the profiling data into something GCC understands...

create_gcov --binary=your_program.unstripped --profile=perf.data --gcov=profile.afdo

Then recompile the application using -fauto-profile. Perf and AutoFDO/create_gcov version specific issues are known to exist. I referred to https://blog.wnohang.net/index.php/2015/04/29/feedback-directed-optimization-with-gcc-and-perf/ for detailed information on this alternative profiling method.

-fprofile-use and -fauto-profile both enable many optimization options by default, in my case the unwanted -funroll-loops which I knew had negative impact on performance in my application. If your the pedantic type, you can test option combinations by including the disabling counterpart in the compile flags, in my case -fno-unroll-loops.

Using internal profiling with my program after stripping the binary, it reduced the size by 25% (compared to original non-profiler stripped binary) however I only observed sub-percentile performance gains and the previous work output fluctuations that are reported by the program log (it's a crypto currency miner) were more erratic, instead of a gradual rising and lowering between peaks and troughs in hash rates like originally.

Overall, a stab in the dark.

like image 27
Rauli Kumpulainen Avatar answered Oct 16 '22 21:10

Rauli Kumpulainen