Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling sleep times with perf

I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile:

#include <unistd.h>
int main() {
  sleep(10);
  return 0; 
}

then I executed it with perf:

$ sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/perf.data.raw ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB /home/pablo/perf.data.raw (~578 samples) ]
$ sudo perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2
$ sudo perf report --stdio --show-total-period -i ~/perf.data
Error:
The /home/pablo/perf.data file has no samples!

Does anybody know how to avoid these errors? What do they mean? failed to write feature 2 doesn't look too user-friendly...

Update:

$ uname -a
Linux debian 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux
like image 494
Pavel Davydov Avatar asked Feb 13 '14 13:02

Pavel Davydov


People also ask

How do I read a perf data file?

There is builtin perf. data parser and printer in perf tool of linux tools with subcommand "script". perf-script - Read perf. data (created by perf record) and display trace output This command reads the input file and displays the trace recorded.

What is perf record?

A perf record command was used to trace the block:block_rq_issue probe, which fires when a block device I/O request is issued (disk I/O). Options included -a to trace all CPUs, and -g to capture call graphs (stack traces). Trace data is written to a perf. data file, and tracing ended when Ctrl-C was hit.

What can perf do?

The perf command can report combinations of performance counters as performance metrics and can inject small scripts (BPF scripts) into the kernel to process this data in real time.

What is perf command in linux?

The Linux perf tool is a lightweight command-line utility for profiling and monitoring CPU performance on Linux systems. Although the tool is simple, it provides in-depth information that helps in analyzing CPUs. The command contains many subcommands for collecting, tracing, and analyzing CPU event data.


3 Answers

There is a error message from your second perf command from https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times - perf inject -s

$ sudo perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2

failed to write feature 2 doesn't look too user-friendly...

... but it was added to perf to made errors more user-friendly: http://lwn.net/Articles/460520/ "perf: make perf.data more self-descriptive (v5)" by Stephane Eranian , 22 Sep 2011:

+static int do_write_feat(int fd, struct perf_header *h, int type,  ....
+           pr_debug("failed to write feature %d\n", type);

All features are listed here http://lxr.free-electrons.com/source/tools/perf/util/header.h#L13

 15         HEADER_TRACING_DATA     = 1,
 16         HEADER_BUILD_ID,

So, it sounds like perf inject was not able to write information about build ids (error from function write_build_id() from util/header.c) if I'm not wrong. There are two cases which can lead to error: unsuccessful call to perf_session__read_build_ids() or failing in writing buildid table dsos__write_buildid_table (this is not our case because there is no "failed to write buildid table" error message; check write_build_id)

You may check, do you have all buildids needed for the session. Also it may be useful to clear your buildid cache (rm -rf ~/.debug), and check that you have up-to-date vmlinux with debugging info or kallsyms enabled in your kernel.

UPDATE: in comments Pavel says that his pref record had no any sched:sched_stat_sleep events written to perf.data:

sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/perf.data.raw ./a.out

As he explains in his answer, his default debian kernel have CONFIG_SCHEDSTATS option disabled with vendor's patch. The redhat did the same thing with the option in release kernels since 3.11, and this is explained in Redhat Bug 1013225 (Josh Boyer 2013-10-28, comment 4):

We switched to enabling that only on debug builds a while ago. It seems that was turned off entirely with the final 3.11.0 build and has remained off since. Internal testing shows the option has a non-trivial performance impact for context switches.

We can turn this on in debug kernels again, but I'm not sure it's worthwhile.

Josh Poimboeuf 2013-11-04 in comment 8 says that performance impact is detectable:

In my tests I did a lot of context switches under various CPU loads. I saw a ~5-10% drop in average context switch speed when CONFIG_SCHEDSTATS was enabled. ...The performance hit only seemed to happen on post-CFS kernels (>= 2.6.23). The previous O(1) scheduler didn't seem to have this issue.

Fedora disabled CONFIG_SCHEDSTAT in non-debug kernels at 12 July 2013 "[kernel] Disable LATENCYTOP/SCHEDSTATS in non-debug builds." by Dave Jones. First kernel with disabled option: 3.11.0-0.rc0.git6.4.

In order to use any perf software tracepoint event with name like sched:sched_stat_* (sched:sched_stat_wait, sched:sched_stat_sleep, sched:sched_stat_iowait) we must recompile kernel with CONFIG_SCHEDSTATS option enabled and replace default Debian, RedHat or Fedora kernels which have no this option.

Thank you, Pavel Davydov.

like image 198
osgx Avatar answered Oct 14 '22 00:10

osgx


I finally found out how to make it work. The problem was that the default debian kernel is built without some config options, that perf needs to be able to monitor sleep times. It looks like CONFIG_SCHEDSTATS should be enabled to make kernel collect scheduler statistics. This is told to have some runtime overhead. Also I enabled CONFIG_SCHED_TRACER and some lock tracing options, but I'm not sure if they matter in my case. Anyway, no statistic data is collected in scheduler without CONFIG_SCHEDSTATS (see kernel/sched/ directory of kernel source).

Also, there is a very good article about perf written by Brendan Gregg, with a lot of usefull examples and some kernel options that are needed to make perf work properly.

Update: I checked the history of CONFIG_SCHEDSTATS in debian. I've checked out debian kernel patches and build scripts repo:

svn checkout svn://svn.debian.org/svn/kernel/dists/trunk/linux/debian

And then found CONFIG_SCHEDSTATS option there

$ grep -R CONFIG_SCHEDSTAT config/
config/config:# CONFIG_SCHEDSTATS is not set

This string was added to the repo in commit 10837, on 2008-03-14, with comment "debian/config: Do complete reorganization". Also, in this and this (thanks to osgx) bug reports it is told that CONFIG_LATENCYTOP, CONFIG_SCHEDSTATS options are not enabled because they can affect kernel perfomance. So, I think it just was never switched on in default debian kernels. I haven't found the discussion about scheduler stats option, though. If I do, I will write back here.

like image 35
Pavel Davydov Avatar answered Oct 13 '22 23:10

Pavel Davydov


This works for me for "perf version 3.11.1" on an "openSUSE 13.1 (x86_64)" box.

Here is the output if you care:

# ========
# captured on: Sun Feb 16 09:49:38 2014
# hostname : *****************
# os release : 3.11.10-7-desktop
# perf version : 3.11.1
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 32945368 kB
# cmdline : /usr/bin/perf inject -v -s -i perf.data.raw -o perf.data 
# event : name = sched:sched_stat_sleep, type = 2, config = 0x48, config1 = 0x0, config2 = 0x
# event : name = sched:sched_switch, type = 2, config = 0x51, config1 = 0x0, config2 = 0x0, e
# event : name = sched:sched_process_exit, type = 2, config = 0x4e, config1 = 0x0, config2 = 
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, 
# ========
#
# Samples: 0  of event 'sched:sched_stat_sleep'
# Event count (approx.): 0
#
# Overhead        Period  Command  Shared Object  Symbol
# ........  ............  .......  .............  ......
#


# Samples: 8  of event 'sched:sched_switch'
# Event count (approx.): 80099958776
#
# Overhead        Period  Command      Shared Object             Symbol
# ........  ............  .......  .................  .................
#
   100.00%   80099958776      bla  [kernel.kallsyms]  [k] thread_return
                |
                --- thread_return
                    thread_return
                    do_nanosleep
                    hrtimer_nanosleep
                    SyS_nanosleep
                    system_call_fastpath
                    0x7fbc0dec6570
                    __GI___libc_nanosleep
                    (nil)



# Samples: 0  of event 'sched:sched_process_exit'
# Event count (approx.): 0
#
# Overhead        Period  Command  Shared Object  Symbol
# ........  ............  .......  .............  ......
#


#
# (For a higher level overview, try: perf report --sort comm,dso)
#
}
like image 36
hivert Avatar answered Oct 13 '22 22:10

hivert