Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The role of kernel.kallsyms in a C++ application running

I compiled my C++ code with the following switches:

g++ -O0 -g -rdynamic -DNDEBUG -DARMA_NO_DEBUG -std=c++11 -pthread

The linker switches are:

-lboost_system -lboost_thread -lboost_chrono -larmadillo -pthread

But I use no thread in my application. I also avoid using any delay function.

Then I run the code and test it with perf tools.

sudo perf record ./bin/my_application
sudo perf report -f

The result is strange to me:

Overhead  Command          Shared Object     Symbol
 50.92%  myApplication  [kernel.kallsyms]  [k] native_sched_clock
 24.73%  myApplication  [kernel.kallsyms]  [k] pick_next_entity
 17.46%  myApplication  [kernel.kallsyms]  [k] prepend_name
  2.57%  myApplication  myApplication      [.] arma::arrayops::copy_small<double>
  1.11%  myApplication  myApplication      [.] arma::Mat<double>::Mat
  1.11%  myApplication  myApplication      [.] myClass::myMethod
  1.11%  myApplication  libblas.so.3       [.] dgemv_
  0.97%  myApplication  myApplication      [.] arma::Mat<double>::init_cold

Why kernel.kallsyms functions are dominating the execution time?

What are native_sched_clock, pick_next_entity, prepend_name each doing for my application?

like image 975
Kejoori Avatar asked Oct 18 '22 11:10

Kejoori


1 Answers

Your application is too just fast and short to be profiled with default frequency of perf record. All lines with [k] and [kernel.kallsyms] are from kernel doing some service jobs like loading your binary and scheduling threads/processes (``). Things may be wrong when you use perf on some kind of virtualized platform like xen, kvm, ...., as most virtualized environments give no access to hardware performance counters to guest kernel (AWS sometimes give basic subset of cycles and instructions on isolated instances); so perf will use software timer interrupts.

Try to add loop around the code you want to measure (repeat it for 100 or 1000 times) and/or increase sizes of your data to be processed. Your program should run at least for several seconds.

You may also try to run perf stat ./program to get timing values and basic hardware performance counts for the program (if counters are supported), and post results of it.

like image 74
osgx Avatar answered Nov 15 '22 04:11

osgx