Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linux perf: how to interpret and find hotspots

I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf.

What I did:

perf record -g -p $(pidof someapp) perf report -g -n 

Now I see something like this:

 +     16.92%  kdevelop  libsqlite3.so.0.8.6               [.] 0x3fe57                                                                                                              ↑ +     10.61%  kdevelop  libQtGui.so.4.7.3                 [.] 0x81e344                                                                                                             ▮ +      7.09%  kdevelop  libc-2.14.so                      [.] 0x85804                                                                                                              ▒ +      4.96%  kdevelop  libQtGui.so.4.7.3                 [.] 0x265b69                                                                                                             ▒ +      3.50%  kdevelop  libQtCore.so.4.7.3                [.] 0x18608d                                                                                                             ▒ +      2.68%  kdevelop  libc-2.14.so                      [.] memcpy                                                                                                               ▒ +      1.15%  kdevelop  [kernel.kallsyms]                 [k] copy_user_generic_string                                                                                             ▒ +      0.90%  kdevelop  libQtGui.so.4.7.3                 [.] QTransform::translate(double, double)                                                                                ▒ +      0.88%  kdevelop  libc-2.14.so                      [.] __libc_malloc                                                                                                        ▒ +      0.85%  kdevelop  libc-2.14.so                      [.] memcpy  ... 

Ok, these functions might be slow, but how do I find out where they are getting called from? As all these hotspots lie in external libraries I see no way to optimize my code.

Basically I am looking for some kind of callgraph annotated with accumulated cost, where my functions have a higher inclusive sampling cost than the library functions I call.

Is this possible with perf? If so - how?

Note: I found out that "E" unwraps the callgraph and gives somewhat more information. But the callgraph is often not deep enough and/or terminates randomly without giving information about how much info was spent where. Example:

 -     10.26%  kate  libkatepartinterfaces.so.4.6.0  [.] Kate::TextLoader::readLine(int&...      Kate::TextLoader::readLine(int&, int&)                                                  Kate::TextBuffer::load(QString const&, bool&, bool&)                                    KateBuffer::openFile(QString const&)                                                    KateDocument::openFile()                                                                0x7fe37a81121c 

Could it be an issue that I'm running on 64 bit? See also: http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html (I'm not using fedora but seems to apply to all 64bit systems).

like image 520
milianw Avatar asked Aug 11 '11 18:08

milianw


People also ask

How do I read a perf data file?

There is builtin perf. data parser and printer in perf tool of linux tools with subcommand "script". perf-script - Read perf. data (created by perf record) and display trace output This command reads the input file and displays the trace recorded.

How do you annotate perf?

You can also annotate using the perf top command. Run the perf top command and press 'a' on any particular symbol that you want to annotate. It also dynamically updates data at a fixed interval.

What does perf do in Linux?

The perf command is used as a primary interface to the Linux kernel performance monitoring capabilities and can record CPU performance counters and trace points.


1 Answers

With Linux 3.7 perf is finally able to use DWARF information to generate the callgraph:

perf record --call-graph dwarf -- yourapp perf report -g graph --no-children 

Neat, but the curses GUI is horrible compared to VTune, KCacheGrind or similar... I recommend to try out FlameGraphs instead, which is a pretty neat visualization: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Note: In the report step, -g graph makes the results output simple to understand "relative to total" percentages, rather than "relative to parent" numbers. --no-children will show only self cost, rather than inclusive cost - a feature that I also find invaluable.

If you have a new perf and Intel CPU, also try out the LBR unwinder, which has much better performance and produces far smaller result files:

perf record --call-graph lbr -- yourapp 

The downside here is that the call stack depth is more limited compared to the default DWARF unwinder configuration.

like image 153
milianw Avatar answered Oct 05 '22 22:10

milianw