Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to resolve "not counted" in perf?

perf stat -d ./sample.out Output is:

Performance counter stats for './sample.out':

          0.586266 task-clock (msec)         #    0.007 CPUs utilized          
                 2 context-switches          #    0.003 M/sec                  
                 1 cpu-migrations            #    0.002 M/sec                  
               116 page-faults               #    0.198 M/sec                  
          7,35,790 cycles                    #    1.255 GHz                     [81.06%]
     <not counted> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses           
   <not supported> L1-dcache-loads:HG      
     <not counted> L1-dcache-load-misses:HG
     <not counted> LLC-loads:HG            
   <not supported> LLC-load-misses:HG      

       0.088013919 seconds time elapsed

I read why will show up from . But I am getting for even basic counters like instructions, branches etc. Can anyone suggest how to make it work?

Interesting thing is:

sudo perf stat sleep 3

gives output:

Performance counter stats for 'sleep 3':

          0.598484 task-clock (msec)         #    0.000 CPUs utilized          
                 2 context-switches          #    0.003 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               181 page-faults               #    0.302 M/sec                  
     <not counted> cycles                  
     <not counted> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses

sudo perf stat -C 1 sleep 3

 Performance counter stats for 'CPU(s) 1':

       3002.640578 task-clock (msec)         #    1.001 CPUs utilized           [100.00%]
               425 context-switches          #    0.142 K/sec                   [100.00%]
                 9 cpu-migrations            #    0.003 K/sec                   [100.00%]
                 5 page-faults               #    0.002 K/sec                  
       7,82,97,019 cycles                    #    0.026 GHz                     [33.32%]
       9,38,21,585 stalled-cycles-frontend   #  119.83% frontend cycles idle    [33.32%]
   <not supported> stalled-cycles-backend  
       3,09,81,643 instructions              #    0.40  insns per cycle        
                                             #    3.03  stalled cycles per insn [33.32%]
         70,15,390 branches                  #    2.336 M/sec                   [33.32%]
          6,38,644 branch-misses             #    9.10% of all branches         [33.32%]

       3.001075650 seconds time elapsed

Why is this unexpected working.??

Thank you

like image 419
ANTHONY Avatar asked Dec 19 '22 17:12

ANTHONY


1 Answers

The typical problem of perf stat -d for very short programs is not the statistical sampling, but multiplexing (percent in square brackets says [33%] - this counter was counted only for around 33% of running time).

You ask your PMU to monitor too many events at once, and perf is unable to map all required counters on real hardware (PMU - performance monitoring unit of the CPU) in same time. Typical PMU may have something like 4 or 7 or 8 independent counters, but the number may be divided by two if you have some SMT technology enabled (for example, HT - HyperThreading).

When you ask perf to count so many counters (you have 6 supported HW events in your perf stat output), it will divide all them into smaller groups. Groups will be changed by kernel at some points in time, when perf_events got chance to change them, for example on task-clock tick (~3 ms).

You can split your run into several with smaller sets of events - any number of SW events and 2-4 HW events per run:

perf stat -e task-clock,page-faults,cycles,stalled-cycles-frontend 
perf stat -e task-clock,page-faults,cycles,instructions            
perf stat -e task-clock,page-faults,branches,branch-misses           
perf stat -e task-clock,page-faults,L1-dcache-load-misses:HG,LLC-loads:HG       
like image 116
osgx Avatar answered Dec 26 '22 12:12

osgx