Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Linux perf report output

Tags:

Though I can intuitively get most of the results, I'm having hard time fully understanding the output of the perf report command especially for what concerns the call graph, so I wrote a stupid test to solve this issue of mine once for all.

The stupid test

I compiled what follows with:

gcc -Wall -pedantic -lm perf-test.c -o perf-test 

No aggressive optimizations to avoid inlining and such.

#include <math.h>  #define N 10000000UL  #define USELESSNESS(n)                          \     do {                                        \         unsigned long i;                        \         double x = 42;                          \         for (i = 0; i < (n); i++) x = sin(x);   \     } while (0)  void baz() {     USELESSNESS(N); }  void bar() {     USELESSNESS(2 * N);     baz(); }  void foo() {     USELESSNESS(3 * N);     bar();     baz(); }  int main() {     foo();     return 0; } 

Flat profiling

perf record ./perf-test perf report 

With these I get:

  94,44%  perf-test  libm-2.19.so       [.] __sin_sse2    2,09%  perf-test  perf-test          [.] sin@plt    1,24%  perf-test  perf-test          [.] foo    0,85%  perf-test  perf-test          [.] baz    0,83%  perf-test  perf-test          [.] bar 

Which sounds reasonable since the heavy work is actually performed by __sin_sse2 and sin@plt is probably just a wrapper, while the overhead of my functions take into account just the loop, overall: 3*N iterations for foo, 2*N for the other two.

Hierarchical profiling

perf record -g ./perf-test perf report -G perf report 

Now the overhead columns that I get are two: Children (the output is sorted by this one by default) and Self (the same overhead of the flat profile).

Here is where I start feeling I miss something: regardless of the fact that I use -G or not I'm unable to explain the hierarchy in terms of "x calls y" or "y is called by x", for example:

  • without -G ("y is called by x"):

    -   94,34%    94,06%  perf-test  libm-2.19.so       [.] __sin_sse2    - __sin_sse2       + 43,67% foo       + 41,45% main       + 14,88% bar -   37,73%     0,00%  perf-test  perf-test          [.] main      main      __libc_start_main -   23,41%     1,35%  perf-test  perf-test          [.] foo      foo      main      __libc_start_main -    6,43%     0,83%  perf-test  perf-test          [.] bar      bar      foo      main      __libc_start_main -    0,98%     0,98%  perf-test  perf-test          [.] baz    - baz       + 54,71% foo       + 45,29% bar 
    1. Why __sin_sse2 is called by main (indirectly?), foo and bar but not by baz?
    2. Why functions sometimes have a percent and a hierarchy attached (e.g., the last instance of baz) and sometimes not (e.g., the last instance of bar)?
  • with -G ("x calls y"):

    -   94,34%    94,06%  perf-test  libm-2.19.so       [.] __sin_sse2    + __sin_sse2    + __libc_start_main    + main -   37,73%     0,00%  perf-test  perf-test          [.] main    - main       + 62,05% foo       + 35,73% __sin_sse2         2,23% sin@plt -   23,41%     1,35%  perf-test  perf-test          [.] foo    - foo       + 64,40% __sin_sse2       + 29,18% bar       + 3,98% sin@plt         2,44% baz      __libc_start_main      main      foo 
    1. How should I interpret the first three entries under __sin_sse2?
    2. main calls foo and that's ok, but why if it calls __sin_sse2 and sin@plt (indirectly?) it does not also call bar and baz?
    3. Why do __libc_start_main and main appear under foo? And why foo appears twice?

Suspect is that there are two levels of this hierarchy, in which the second actually represents the "x calls y"/"y is called by x" semantics, but I'm tired to guess so I'm asking here. And the documentation doesn't seem to help.


Sorry for the long post but I hope that all this context may help or act as a reference for someone else too.

like image 859
cYrus Avatar asked Jan 02 '15 12:01

cYrus


People also ask

How do I read a perf data file?

There is builtin perf. data parser and printer in perf tool of linux tools with subcommand "script". perf-script - Read perf. data (created by perf record) and display trace output This command reads the input file and displays the trace recorded.

What is perf report?

perf report is able to auto-detect whether a perf. data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used. --branch-history Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample.

What does overhead mean in perf?

Sampling with perf record Overhead: the percentage of overall samples collected in the corresponding function.

How does perf work in linux?

Perf works on the Model Specific Registers of your CPU for measurements like cycles or branch-misses or so. A special Part called PMU(Performance Measurement Unit) is counting all kinds of events.


1 Answers

Alright, well, let's ignore temporarily the difference between caller and callee call-graphs, mostly because when I compare the results between these two options on my machine, I only see effects inside the kernel.kallsyms DSO for reasons I don't understand -- relatively new to this myself.

I found that for your example, it's a little easier to read the whole tree. So, using --stdio, let's look at the whole tree for __sin_sse2:

# Overhead    Command      Shared Object                  Symbol # ........  .........  .................  ...................... #     94.72%  perf-test  libm-2.19.so       [.] __sin_sse2             |             --- __sin_sse2                |                |--44.20%-- foo                |          |                |           --100.00%-- main                |                     __libc_start_main                |                     _start                |                     0x0                |                |--27.95%-- baz                |          |                |          |--51.78%-- bar                |          |          foo                |          |          main                |          |          __libc_start_main                |          |          _start                |          |          0x0                |          |                |           --48.22%-- foo                |                     main                |                     __libc_start_main                |                     _start                |                     0x0                |                 --27.84%-- bar                           |                            --100.00%-- foo                                      main                                      __libc_start_main                                      _start                                      0x0 

So, the way I read this is: 44% of the time, sin is called from foo; 27% of the time it's called from baz, and 27% from bar.

The documentation for -g is instructive:

 -g [type,min[,limit],order[,key]], --call-graph        Display call chains using type, min percent threshold, optional print limit and order. type can be either:         ·   flat: single column, linear exposure of call chains.         ·   graph: use a graph tree, displaying absolute overhead rates.         ·   fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.                 order can be either:                - callee: callee based call graph.                - caller: inverted caller based call graph.                 key can be:                - function: compare on functions                - address: compare on individual code addresses                 Default: fractal,0.5,callee,function. 

The important piece here is that the default is fractal, and in fractal mode, each branch is a new object.

So, you can see that 50% of the time that baz is called, it's called from bar, and the other 50% it's called from foo.

This isn't always the most useful measure, so it's instructive to look at the results using -g graph:

94.72%  perf-test  libm-2.19.so       [.] __sin_sse2         |         --- __sin_sse2            |            |--41.87%-- foo            |          |            |           --41.48%-- main            |                     __libc_start_main            |                     _start            |                     0x0            |            |--26.48%-- baz            |          |            |          |--13.50%-- bar            |          |          foo            |          |          main            |          |          __libc_start_main            |          |          _start            |          |          0x0            |          |            |           --12.57%-- foo            |                     main            |                     __libc_start_main            |                     _start            |                     0x0            |             --26.38%-- bar                       |                        --26.17%-- foo                                  main                                  __libc_start_main                                  _start                                  0x0 

This changes to using absolute percentages, where each percentage of time is reported for that call chain: So foo->bar is 26% of the total ticks (which in turn calls baz), and foo->baz (direct) is 12% of the total ticks.

I still have no idea why I don't see any differences between callee and caller graphs though, from the perspective of __sin_sse2.

Update

One thing I did change from your command line is how the callgraphs were gathered. Linux perf by default uses the frame pointer method of reconstructing callstacks. This can be a problem when the compiler uses -fomit-frame-pointer as a default. So I used

perf record --call-graph dwarf ./perf-test 
like image 150
Matthew G. Avatar answered Sep 23 '22 03:09

Matthew G.