linux perf: how to interpret and find hotspots

Tags:

I tried out linux' perf utility today and am having trouble in interpreting its results. I'm used to valgrind's callgrind which is of course a totally different approach to the sampling based method of perf.

What I did:

perf record -g -p $(pidof someapp) perf report -g -n

Now I see something like this:

 +     16.92%  kdevelop  libsqlite3.so.0.8.6               [.] 0x3fe57                                                                                                              ↑ +     10.61%  kdevelop  libQtGui.so.4.7.3                 [.] 0x81e344                                                                                                             ▮ +      7.09%  kdevelop  libc-2.14.so                      [.] 0x85804                                                                                                              ▒ +      4.96%  kdevelop  libQtGui.so.4.7.3                 [.] 0x265b69                                                                                                             ▒ +      3.50%  kdevelop  libQtCore.so.4.7.3                [.] 0x18608d                                                                                                             ▒ +      2.68%  kdevelop  libc-2.14.so                      [.] memcpy                                                                                                               ▒ +      1.15%  kdevelop  [kernel.kallsyms]                 [k] copy_user_generic_string                                                                                             ▒ +      0.90%  kdevelop  libQtGui.so.4.7.3                 [.] QTransform::translate(double, double)                                                                                ▒ +      0.88%  kdevelop  libc-2.14.so                      [.] __libc_malloc                                                                                                        ▒ +      0.85%  kdevelop  libc-2.14.so                      [.] memcpy  ...

Ok, these functions might be slow, but how do I find out where they are getting called from? As all these hotspots lie in external libraries I see no way to optimize my code.

Basically I am looking for some kind of callgraph annotated with accumulated cost, where my functions have a higher inclusive sampling cost than the library functions I call.

Is this possible with perf? If so - how?

Note: I found out that "E" unwraps the callgraph and gives somewhat more information. But the callgraph is often not deep enough and/or terminates randomly without giving information about how much info was spent where. Example:

 -     10.26%  kate  libkatepartinterfaces.so.4.6.0  [.] Kate::TextLoader::readLine(int&...      Kate::TextLoader::readLine(int&, int&)                                                  Kate::TextBuffer::load(QString const&, bool&, bool&)                                    KateBuffer::openFile(QString const&)                                                    KateDocument::openFile()                                                                0x7fe37a81121c

Could it be an issue that I'm running on 64 bit? See also: http://lists.fedoraproject.org/pipermail/devel/2010-November/144952.html (I'm not using fedora but seems to apply to all 64bit systems).

520

asked Aug 11 '11 18:08

milianw

1 Answers

With Linux 3.7 perf is finally able to use DWARF information to generate the callgraph:

perf record --call-graph dwarf -- yourapp perf report -g graph --no-children

Neat, but the curses GUI is horrible compared to VTune, KCacheGrind or similar... I recommend to try out FlameGraphs instead, which is a pretty neat visualization: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Note: In the report step, -g graph makes the results output simple to understand "relative to total" percentages, rather than "relative to parent" numbers. --no-children will show only self cost, rather than inclusive cost - a feature that I also find invaluable.

If you have a new perf and Intel CPU, also try out the LBR unwinder, which has much better performance and produces far smaller result files:

perf record --call-graph lbr -- yourapp

The downside here is that the call stack depth is more limited compared to the default DWARF unwinder configuration.

153

answered Oct 05 '22 22:10

milianw

Related questions
                            
                                How does one iterate through an unordered set in C++?
                            
                                Optimize in CMake by default
                            
                                Choosing between WPF/C# and Qt/C++ [closed]
                            
                                Undefined reference error for template method [duplicate]
                            
                                How can I convert a cv::Mat to a gray scale in OpenCv?
                            
                                Non-static const member, can't use default assignment operator
                            
                                Is a string literal in С++ created in static memory?
                            
                                Catching exception: divide by zero
                            
                                What does std::vector look like in memory?
                            
                                C++ Static member method call on class instance
                            
                                DLL Load Library - Error Code 126
                            
                                Line by line c - c++ code debugging in Linux ubuntu [closed]
                            
                                Is it bad to depend on index 0 of an empty std::string?
                            
                                What is the proper use of the comma operator?
                            
                                Android ndk std::to_string support
                            
                                What is the underlying type of a c++ enum?
                            
                                Macro / keyword which can be used to print out method name?
                            
                                Is there pointer in C# like C++? Is it safe?
                            
                                cin and getline skipping input [duplicate]
                            
                                Compile C files in C++ project which do not use precompiled header?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

linux perf: how to interpret and find hotspots

Tags:

c++

performance

linux

profiling

perf

milianw

People also ask

1 Answers

milianw

Recent Activity

Donate For Us