what does "iterations" mean in "perf report -b --branch-history" (of perf record -b -g)

Question

I'm profiling a toy program (selection sort) using perf and I wonder what iterations correspond to in the perf report output. The addresses it show correspond to the inner loop and if statement. I hope somebody can help. Also, the basic block cycles column disappear when I use " -b --branch-history" with perf. I don't know why.

This is the portion of my code getting sampled (MAX_LENGTH is 500):

   35 // FROM: https://www.geeksforgeeks.org/selection-sort
   37 void swap(int *xp, int *yp)
   38 {
   39     int temp = *xp;
   40     *xp = *yp;
   41     *yp = temp;
   42 }
   43       
   44 void selection_sort(int arr[])
   45 {
   46     int i, j, min_idx;
   47
   48     // One by one move boundary of unsorted subarray
   49     for (i = 0; i < MAX_LENGTH-1; i++)
   50     {
   51         // Find the minimum element in unsorted array
   52         min_idx = i;
   53         for (j = i+1; j < MAX_LENGTH; j++)
   54           if (arr[j] < arr[min_idx])
   55             min_idx = j;
   56
   57         // Swap the found minimum element with the first element
   58         swap(&arr[min_idx], &arr[i]);
   59     }
   60 }

compiled using (clang version 5.0.0):

clang -O0 -g selection_sort.c -o selection_sort_g_O0

Here's how I invoke perf record:

sudo perf record -e cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=1009/pp -b -g ./selection_sort_g_O0

perf report and its output:

sudo perf report -b --branch-history --no-children

Samples: 376  of event 'br_inst_retired_near_taken', Event count (approx.): 37603384
  Overhead  Source:Line               Symbol                  Shared Object                                                                                                                                         ▒
+   51.86%  selection_sort_g_O0[862]  [.] 0x0000000000000862  selection_sort_g_O0                                                                                                                                   ▒
-   24.47%  selection_sort_g_O0[86e]  [.] 0x000000000000086e  selection_sort_g_O0                                                                                                                                   ▒
     0x873 (cycles:1)                                                                                                                                                                                               ▒
   - 0x86e (cycles:1)                                                                                                                                                                                               ▒
      - 23.94% 0x86e (cycles:3 iterations:25)                                                                                                                                                                       ▒
           0x862 (cycles:3)                                                                                                                                                                                         ▒
           0x83f (cycles:1)                                                                                                                                                                                         ▒
           0x87c (cycles:1)                                                                                                                                                                                         ▒
           0x873 (cycles:1)                                                                                                                                                                                         ▒
           0x86e (cycles:1)                                                                                                                                                                                         ▒
           0x86e (cycles:3)                                                                                                                                                                                         ▒
           0x862 (cycles:3)                                                                                                                                                                                         ▒
           0x83f (cycles:1)                                                                                                                                                                                         ▒
           0x87c (cycles:1)                                                                                                                                                                                         ▒
           0x873 (cycles:1)                                                                                                                                                                                         ▒
           0x86e (cycles:1)                                                                                                                                                                                         ▒
           0x86e (cycles:3)                                                                                                                                                                                         ▒
           0x862 (cycles:3)                                                                                                                                                                                         ▒
+   22.61%  selection_sort_g_O0[87c]  [.] 0x000000000000087c  selection_sort_g_O0                                                                                                                                   ▒
+    1.06%  selection_sort_g_O0[8a5]  [.] 0x00000000000008a5  selection_sort_g_O0

I used objdump for a mapping between addresses and source file lines:

objdump -Dleg selection_sort_g_O0 > selection_sort_g_O0.s

../selection_sort.c:53
 836:   8b 45 f4                mov    -0xc(%rbp),%eax
 839:   83 c0 01                add    $0x1,%eax
 83c:   89 45 f0                mov    %eax,-0x10(%rbp)
 83f:   81 7d f0 f4 01 00 00    cmpl   $0x1f4,-0x10(%rbp)
 846:   0f 8d 35 00 00 00       jge    881 <selection_sort+0x71>
../selection_sort.c:54
 84c:   48 8b 45 f8             mov    -0x8(%rbp),%rax
 850:   48 63 4d f0             movslq -0x10(%rbp),%rcx
 854:   8b 14 88                mov    (%rax,%rcx,4),%edx
 857:   48 8b 45 f8             mov    -0x8(%rbp),%rax
 85b:   48 63 4d ec             movslq -0x14(%rbp),%rcx
 85f:   3b 14 88                cmp    (%rax,%rcx,4),%edx
 862:   0f 8d 06 00 00 00       jge    86e <selection_sort+0x5e>
../selection_sort.c:55
 868:   8b 45 f0                mov    -0x10(%rbp),%eax
 86b:   89 45 ec                mov    %eax,-0x14(%rbp)
../selection_sort.c:54
 86e:   e9 00 00 00 00          jmpq   873 <selection_sort+0x63>
../selection_sort.c:53
 873:   8b 45 f0                mov    -0x10(%rbp),%eax
 876:   83 c0 01                add    $0x1,%eax
 879:   89 45 f0                mov    %eax,-0x10(%rbp)
 87c:   e9 be ff ff ff          jmpq   83f <selection_sort+0x2f>

Arnabjyoti Kalita · Accepted Answer

I will try to reiterate and add some more information on top of Zulan's answer.

Last Branch Records (LBRs) allow finding the hot execution paths in an executable to examine them directly for optimization opportunities. In perf this is implemented by extending the call-stack display mechanism and adding the last basic blocks into the call stack, which is normally used to display the most common hierarchy of function calls.

This can be done by using the call graph (-g) and LBR (-b) options in perf record and the --branch-history option in perf report, which adds the last branch information to the call graph. Essentially it gives 8-32 branches extra context of why something happened.

The timed LBR feature in recent perf versions reports the average number of cycles per basic blocks.

What is Iterations ?

From what I can understand, the branch history code has a loop detection function. This allows us to obtain the number of iterations by calculating the number of removed loops. The removal of repeated loops was introduced only in perf report output (to display it in a histogram format) via a previous commit in the Linux kernel.

struct iterations is a useful C struct to be used to display the number of iterations in perf report.

This is where the number of iterations are being saved to be displayed in your perf report output. The save_iterations function is being called from inside the remove_loops function.

The loops are being removed at the time of resolving the callchain.

You can also read this commit which describes how the perf report displays the number of iterations and changes that have been introduced in the newer Linux kernel versions.

what does "iterations" mean in "perf report -b --branch-history" (of perf record -b -g)

Tags:

branch-prediction

x86

assembly

perf

drunk teapot

1 Answers

Arnabjyoti Kalita

Recent Activity

Donate For Us

what does "iterations" mean in "perf report -b --branch-history" (of perf record -b -g)

Tags:

branch-prediction

x86

assembly

perf

drunk teapot

1 Answers

Arnabjyoti Kalita

Related questions

Recent Activity

Donate For Us