Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux application profiling

I need some means of recording the performance of an application on a Linux machine. I won't have an IDE.

Ideally, I need an app that will attach to a process and log periodic snapshots of: memory usage number of threads CPU usage

Any ideas?

like image 224
MalcomTucker Avatar asked Feb 09 '10 13:02

MalcomTucker


People also ask

What is profiling in Linux?

Profiling is used in Linux to improve code performance by analysing call times and call chains involved in the operation. You can find out the time taken by functions (which function code took a long time to run) and this can be very useful to identify bottlenecks.

What is profiling an application?

Software development teams are looking for solutions to improve application performance and gain visibility. Application profiling solutions enable the discovery of resources, baseline application performance, and visualization of component interaction through flow maps built on real-time data.

How do I perform a CPU profiling on Linux?

The Linux kernel has recently implemented a very useful perf infrastructure for profiling various CPU and software events. To get the perf command, install linux-tools-common on ubuntu, linux-base on debian, perf-utils on archlinux, or perf on fedora. Then you can profile the system like: $ perf record -a -g sleep 10 $ perf report --sort comm,dso.

What is the best tool for profiling NET Core apps on Linux?

There’re multiple tools to use out there, but the basic toolkit for profiling .NET Core app on Linux seems to be perf utility along with lttng and perfcollect. Let’s have a look at all of them.

How do I enable system wide profiling in Linux?

System wide profiling. The Linux kernel has recently implemented a very useful perf infrastructure for profiling various CPU and software events. To get the perf command, install linux-tools-common on ubuntu, linux-base on debian, perf-utils on archlinux, or perf on fedora.

How do I use perf to profile in Linux?

You can use perf to profile with perf record and perf report commands: The perf record command collects samples and generates an output file called perf.data. This file can then be analyzed using perf report and perf annotate commands. Sampling frequency can be specified with -F option.


2 Answers

Ideally, I need an app that will attach to a process and log periodic snapshots of: memory usage number of threads CPU usage

Well, in order to collect this types of information about your process you don't actuall need a profiler on Linux.
1) You can use top in batch mode. It runs in the batch mode either until it is killed or until N iterations is done :

top -b -p `pidof a.out` 

or

top -b -p `pidof a.out` -n 100 

and you wiil get this:

$ top -b -p `pidof a.out` top - 10:31:50 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02 Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st Mem:  16330584k total,  2335024k used, 13995560k free,   241348k buffers Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 24402 SK        20   0 98.7m 1056  860 S 43.9  0.0   0:11.87 a.out   top - 10:31:53 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02 Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie Cpu(s):  0.9%us,  3.7%sy,  0.0%ni, 95.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st Mem:  16330584k total,  2335148k used, 13995436k free,   241348k buffers Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 24402 SK      20   0 98.7m 1072  860 S 19.0  0.0   0:12.44 a.out 

2) You can use ps (for instance in a shell script)

ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out` 

I need some means of recording the performance of an application on a Linux machine

In order to do this you need to use perf if your Linux Kernal is greater than 2.6.32 or Oprofile if it is older. Both programs don't require from you to instrucment your program (like gporf requires). However in order to ger call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

As for Linux perf:

1) To record performance data:

perf record -p `pidof a.out` 

or to record for 10 secs:

perf record -p `pidof a.out` sleep 10 

or to record with call graph ()

perf record -g -p `pidof a.out`  

2) To analyze the recorded data

perf report --stdio perf report --stdio --sort=dso -g none perf report --stdio -g none perf report --stdio -g 

On RHEL 6.3 it is allowed to read /boot/System.map-2.6.32-279.el6.x86_64 so I usually add --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64 when do perf report:

perf report --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64 


Here I wrote some more information on using Linux perf:

First of all - this is tutorial about Linux profiling with perf

You can use perf if your Linux Kernel is greater than 2.6.32 or oprofile if it is older. Both programs don't require from you to instrument your program (like gprof requires). However in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp. You can see "live" analysis of your application with perf top:

sudo perf top -p `pidof a.out` -K 

Or you can record performance data of a running application and analyze them after that: 1) To record performance data:

perf record -p `pidof a.out` 

or to record for 10 secs:

perf record -p `pidof a.out` sleep 10 

or to record with call graph ()

perf record -g -p `pidof a.out`  

2) To analyze the recorded data

perf report --stdio perf report --stdio --sort=dso -g none perf report --stdio -g none perf report --stdio -g 

Or you can record performace data of a application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out 

This is an example of profiling a test program The test program is in file main.cpp (I will put main.cpp at the bottom of the message): I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test 

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program

./my_test 100000000  

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30 

Then I analyze load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data  # Overhead  Command                 Shared Object # ........  .......  ............................ #     70.06%  my_test  my_test     28.33%  my_test  libtcmalloc_minimal.so.0.1.0      1.61%  my_test  [kernel.kallsyms] 

Then load per function is analyzed:

perf report --stdio -g none -i ./my_test.perf.data | c++filt  # Overhead  Command                 Shared Object                       Symbol # ........  .......  ............................  ........................... #     29.30%  my_test  my_test                       [.] f2(long)     29.14%  my_test  my_test                       [.] f1(long)     15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)     13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)      9.44%  my_test  my_test                       [.] process_request(long)      1.01%  my_test  my_test                       [.] operator delete(void*)@plt      0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt      0.20%  my_test  my_test                       [.] main      0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt      0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock      0.13%  my_test  [kernel.kallsyms]             [k] native_write_msr_safe       and so on ... 

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt  # Overhead  Command                 Shared Object                       Symbol # ........  .......  ............................  ........................... #     29.30%  my_test  my_test                       [.] f2(long)             |             --- f2(long)                |                 --29.01%-- process_request(long)                           main                           __libc_start_main      29.14%  my_test  my_test                       [.] f1(long)             |             --- f1(long)                |                |--15.05%-- process_request(long)                |          main                |          __libc_start_main                |                 --13.79%-- f2(long)                           process_request(long)                           main                           __libc_start_main      15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)             |             --- operator new(unsigned long)                |                |--11.44%-- f1(long)                |          |                |          |--5.75%-- process_request(long)                |          |          main                |          |          __libc_start_main                |          |                |           --5.69%-- f2(long)                |                     process_request(long)                |                     main                |                     __libc_start_main                |                 --3.01%-- process_request(long)                           main                           __libc_start_main      13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)             |             --- operator delete(void*)                |                |--9.13%-- f1(long)                |          |                |          |--4.63%-- f2(long)                |          |          process_request(long)                |          |          main                |          |          __libc_start_main                |          |                |           --4.51%-- process_request(long)                |                     main                |                     __libc_start_main                |                |--3.05%-- process_request(long)                |          main                |          __libc_start_main                |                 --0.80%-- f2(long)                           process_request(long)                           main                           __libc_start_main       9.44%  my_test  my_test                       [.] process_request(long)             |             --- process_request(long)                |                 --9.39%-- main                           __libc_start_main       1.01%  my_test  my_test                       [.] operator delete(void*)@plt             |             --- operator delete(void*)@plt       0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt             |             --- operator new(unsigned long)@plt       0.20%  my_test  my_test                       [.] main      0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt      0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock      and so on ... 

So at this point you know where your program spends time. And this is main.cpp for the test:

#include <stdio.h> #include <stdlib.h> #include <time.h>  time_t f1(time_t time_value) {   for (int j =0; j < 10; ++j) {     ++time_value;     if (j%5 == 0) {       double *p = new double;       delete p;     }   }   return time_value; }  time_t f2(time_t time_value) {   for (int j =0; j < 40; ++j) {     ++time_value;   }   time_value=f1(time_value);   return time_value; }  time_t process_request(time_t time_value) {    for (int j =0; j < 10; ++j) {     int *p = new int;     delete p;     for (int m =0; m < 10; ++m) {       ++time_value;     }   }   for (int i =0; i < 10; ++i) {     time_value=f1(time_value);     time_value=f2(time_value);   }   return time_value; }  int main(int argc, char* argv2[]) {   int number_loops = argc > 1 ? atoi(argv2[1]) : 1;   time_t time_value = time(0);   printf("number loops %d\n", number_loops);   printf("time_value: %d\n", time_value );    for (int i =0; i < number_loops; ++i) {     time_value = process_request(time_value);   }   printf("time_value: %ld\n", time_value );   return 0; } 
like image 171
5 revsuser184968 Avatar answered Sep 20 '22 15:09

5 revsuser184968


Quoting Linus Torvalds himself:

"Don't use gprof. You're _much_ better off using the newish Linux 'perf' tool." 

and later ...

"I can pretty much guarantee that once you start using it, you'll never use gprof or oprofile again." 

See: http://marc.info/?l=git&m=126262088816902&w=2

Good luck!

like image 25
holygeek Avatar answered Sep 19 '22 15:09

holygeek