Understanding Linux perf report output

Tags:

Though I can intuitively get most of the results, I'm having hard time fully understanding the output of the perf report command especially for what concerns the call graph, so I wrote a stupid test to solve this issue of mine once for all.

The stupid test

I compiled what follows with:

gcc -Wall -pedantic -lm perf-test.c -o perf-test

No aggressive optimizations to avoid inlining and such.

#include <math.h>  #define N 10000000UL  #define USELESSNESS(n)                          \     do {                                        \         unsigned long i;                        \         double x = 42;                          \         for (i = 0; i < (n); i++) x = sin(x);   \     } while (0)  void baz() {     USELESSNESS(N); }  void bar() {     USELESSNESS(2 * N);     baz(); }  void foo() {     USELESSNESS(3 * N);     bar();     baz(); }  int main() {     foo();     return 0; }

Flat profiling

perf record ./perf-test perf report

With these I get:

  94,44%  perf-test  libm-2.19.so       [.] __sin_sse2    2,09%  perf-test  perf-test          [.] sin@plt    1,24%  perf-test  perf-test          [.] foo    0,85%  perf-test  perf-test          [.] baz    0,83%  perf-test  perf-test          [.] bar

Which sounds reasonable since the heavy work is actually performed by __sin_sse2 and sin@plt is probably just a wrapper, while the overhead of my functions take into account just the loop, overall: 3*N iterations for foo, 2*N for the other two.

Hierarchical profiling

perf record -g ./perf-test perf report -G perf report

Now the overhead columns that I get are two: Children (the output is sorted by this one by default) and Self (the same overhead of the flat profile).

Here is where I start feeling I miss something: regardless of the fact that I use -G or not I'm unable to explain the hierarchy in terms of "x calls y" or "y is called by x", for example:

without -G ("y is called by x"):

-   94,34%    94,06%  perf-test  libm-2.19.so       [.] __sin_sse2    - __sin_sse2       + 43,67% foo       + 41,45% main       + 14,88% bar -   37,73%     0,00%  perf-test  perf-test          [.] main      main      __libc_start_main -   23,41%     1,35%  perf-test  perf-test          [.] foo      foo      main      __libc_start_main -    6,43%     0,83%  perf-test  perf-test          [.] bar      bar      foo      main      __libc_start_main -    0,98%     0,98%  perf-test  perf-test          [.] baz    - baz       + 54,71% foo       + 45,29% bar

Why __sin_sse2 is called by main (indirectly?), foo and bar but not by baz?
Why functions sometimes have a percent and a hierarchy attached (e.g., the last instance of baz) and sometimes not (e.g., the last instance of bar)?

with -G ("x calls y"):

-   94,34%    94,06%  perf-test  libm-2.19.so       [.] __sin_sse2    + __sin_sse2    + __libc_start_main    + main -   37,73%     0,00%  perf-test  perf-test          [.] main    - main       + 62,05% foo       + 35,73% __sin_sse2         2,23% sin@plt -   23,41%     1,35%  perf-test  perf-test          [.] foo    - foo       + 64,40% __sin_sse2       + 29,18% bar       + 3,98% sin@plt         2,44% baz      __libc_start_main      main      foo

How should I interpret the first three entries under __sin_sse2?
main calls foo and that's ok, but why if it calls __sin_sse2 and sin@plt (indirectly?) it does not also call bar and baz?
Why do __libc_start_main and main appear under foo? And why foo appears twice?

Suspect is that there are two levels of this hierarchy, in which the second actually represents the "x calls y"/"y is called by x" semantics, but I'm tired to guess so I'm asking here. And the documentation doesn't seem to help.

Sorry for the long post but I hope that all this context may help or act as a reference for someone else too.

859

asked Jan 02 '15 12:01

cYrus

1 Answers

Alright, well, let's ignore temporarily the difference between caller and callee call-graphs, mostly because when I compare the results between these two options on my machine, I only see effects inside the kernel.kallsyms DSO for reasons I don't understand -- relatively new to this myself.

I found that for your example, it's a little easier to read the whole tree. So, using --stdio, let's look at the whole tree for __sin_sse2:

# Overhead    Command      Shared Object                  Symbol # ........  .........  .................  ...................... #     94.72%  perf-test  libm-2.19.so       [.] __sin_sse2             |             --- __sin_sse2                |                |--44.20%-- foo                |          |                |           --100.00%-- main                |                     __libc_start_main                |                     _start                |                     0x0                |                |--27.95%-- baz                |          |                |          |--51.78%-- bar                |          |          foo                |          |          main                |          |          __libc_start_main                |          |          _start                |          |          0x0                |          |                |           --48.22%-- foo                |                     main                |                     __libc_start_main                |                     _start                |                     0x0                |                 --27.84%-- bar                           |                            --100.00%-- foo                                      main                                      __libc_start_main                                      _start                                      0x0

So, the way I read this is: 44% of the time, sin is called from foo; 27% of the time it's called from baz, and 27% from bar.

The documentation for -g is instructive:

 -g [type,min[,limit],order[,key]], --call-graph        Display call chains using type, min percent threshold, optional print limit and order. type can be either:         ·   flat: single column, linear exposure of call chains.         ·   graph: use a graph tree, displaying absolute overhead rates.         ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.                 order can be either:                - callee: callee based call graph.                - caller: inverted caller based call graph.                 key can be:                - function: compare on functions                - address: compare on individual code addresses                 Default: fractal,0.5,callee,function.

The important piece here is that the default is fractal, and in fractal mode, each branch is a new object.

So, you can see that 50% of the time that baz is called, it's called from bar, and the other 50% it's called from foo.

This isn't always the most useful measure, so it's instructive to look at the results using -g graph:

94.72%  perf-test  libm-2.19.so       [.] __sin_sse2         |         --- __sin_sse2            |            |--41.87%-- foo            |          |            |           --41.48%-- main            |                     __libc_start_main            |                     _start            |                     0x0            |            |--26.48%-- baz            |          |            |          |--13.50%-- bar            |          |          foo            |          |          main            |          |          __libc_start_main            |          |          _start            |          |          0x0            |          |            |           --12.57%-- foo            |                     main            |                     __libc_start_main            |                     _start            |                     0x0            |             --26.38%-- bar                       |                        --26.17%-- foo                                  main                                  __libc_start_main                                  _start                                  0x0

This changes to using absolute percentages, where each percentage of time is reported for that call chain: So foo->bar is 26% of the total ticks (which in turn calls baz), and foo->baz (direct) is 12% of the total ticks.

I still have no idea why I don't see any differences between callee and caller graphs though, from the perspective of __sin_sse2.

Update

One thing I did change from your command line is how the callgraphs were gathered. Linux perf by default uses the frame pointer method of reconstructing callstacks. This can be a problem when the compiler uses -fomit-frame-pointer as a default. So I used

perf record --call-graph dwarf ./perf-test

150

answered Sep 23 '22 03:09

Matthew G.

Related questions
                            
                                UIScrollView Programmatically in Swift
                            
                                Better way to require.extensions with Node.js
                            
                                External source maps for minified, transpiled ES6 code with webpack and gulp
                            
                                Get pitch and roll from matrix without singularities
                            
                                Fit div width to inline-block children (unknown number of divs per row)
                            
                                Accessing Google Inbox reminders using API
                            
                                Using `super` within an arrow function within an arrow function within a method
                            
                                RDS with Cloud Formation and AZ issues
                            
                                Persistent DB Connection in Django/WSGI application
                            
                                Elegantly define multi-dimensional array in modern C++
                            
                                How to combine (minify) compiled Angular 2 components?
                            
                                Will C++17 allow forward declaration of nested classes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With