Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perf_event_open Overflow Signal

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.

For this purpose, I use the overflow signal behaviour provided by perf_event_open.

I'm using the second way the manpage proposes to achieve overflow signals:

Signal overflow

Events can be set to deliver a signal when a threshold is crossed. The signal handler is set up using the poll(2), select(2), epoll(2) and fcntl(2), system calls.

[...]

The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl. This ioctl adds to a counter that decrements each time the event overflows. When nonzero, a POLL_IN signal is sent on overflow, but once the value reaches 0, a signal is sent of type POLL_HUP and the underlying event is disabled.

Further explanation of PERF_EVENT_IOC_REFRESH ioctl:

PERF_EVENT_IOC_REFRESH

Non-inherited overflow counters can use this to enable a counter for a number of overflows specified by the argument, after which it is disabled. Subsequent calls of this ioctl add the argument value to the current count. A signal with POLL_IN set will happen on each overflow until the count reaches 0; when that happens a signal with POLL_HUP set is sent and the event is disabled. Using an argument of 0 is considered undefined behavior.

A very minimal example would look like this:

#define _GNU_SOURCE 1

#include <asm/unistd.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long perf_event_open(struct perf_event_attr* event_attr, pid_t pid, int cpu, int group_fd, unsigned long flags)
{
    return syscall(__NR_perf_event_open, event_attr, pid, cpu, group_fd, flags);
}

static void perf_event_handler(int signum, siginfo_t* info, void* ucontext) {
    if(info->si_code != POLL_HUP) {
        // Only POLL_HUP should happen.
        exit(EXIT_FAILURE);
    }

    ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1);
}

int main(int argc, char** argv)
{
    // Configure signal handler
    struct sigaction sa;
    memset(&sa, 0, sizeof(struct sigaction));
    sa.sa_sigaction = perf_event_handler;
    sa.sa_flags = SA_SIGINFO;

    // Setup signal handler
    if (sigaction(SIGIO, &sa, NULL) < 0) {
        fprintf(stderr,"Error setting up signal handler\n");
        perror("sigaction");
        exit(EXIT_FAILURE);
    }

    // Configure perf_event_attr struct
    struct perf_event_attr pe;
    memset(&pe, 0, sizeof(struct perf_event_attr));
    pe.type = PERF_TYPE_HARDWARE;
    pe.size = sizeof(struct perf_event_attr);
    pe.config = PERF_COUNT_HW_INSTRUCTIONS;     // Count retired hardware instructions
    pe.disabled = 1;        // Event is initially disabled
    pe.sample_type = PERF_SAMPLE_IP;
    pe.sample_period = 1000;
    pe.exclude_kernel = 1;      // excluding events that happen in the kernel-space
    pe.exclude_hv = 1;          // excluding events that happen in the hypervisor

    pid_t pid = 0;  // measure the current process/thread
    int cpu = -1;   // measure on any cpu
    int group_fd = -1;
    unsigned long flags = 0;

    int fd = perf_event_open(&pe, pid, cpu, group_fd, flags);
    if (fd == -1) {
        fprintf(stderr, "Error opening leader %llx\n", pe.config);
        perror("perf_event_open");
        exit(EXIT_FAILURE);
    }

    // Setup event handler for overflow signals
    fcntl(fd, F_SETFL, O_NONBLOCK|O_ASYNC);
    fcntl(fd, F_SETSIG, SIGIO);
    fcntl(fd, F_SETOWN, getpid());

    ioctl(fd, PERF_EVENT_IOC_RESET, 0);     // Reset event counter to 0
    ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);   // 

// Start monitoring

    long loopCount = 1000000;
    long c = 0;
    long i = 0;

    // Some sample payload.
    for(i = 0; i < loopCount; i++) {
        c += 1;
    }

// End monitoring

    ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);   // Disable event

    long long counter;
    read(fd, &counter, sizeof(long long));  // Read event counter value

    printf("Used %lld instructions\n", counter);

    close(fd);
}

So basically I'm doing the following:

  1. Set up a signal handler for SIGIO signals
  2. Create a new performance counter with perf_event_open (returns a file descriptor)
  3. Use fcntl to add signal sending behavior to the file descriptor.
  4. Run a payload loop to execute many instructions.

When executing the payload loop, at some point 1000 instructions (the sample_interval) will have been executed. According to the perf_event_open manpage this triggers an overflow which will then decrement an internal counter. Once this counter reaches zero, "a signal is sent of type POLL_HUP and the underlying event is disabled."

When a signal is sent, the control flow of the current process/thread is stopped, and the signal handler is executed. Scenario:

  1. 1000 instructions have been executed.
  2. Event is automatically disabled and a signal is sent.
  3. Signal is immediately delivered, control flow of the process is stopped and the signal handler is executed.

This scenario would mean two things:

  • The final amount of counted instructions would always be equal to an example which does not use signals at all.
  • The instruction pointer which has been saved for the signal handler (and can be accessed through ucontext) would directly point to the instruction which caused the overflow.

Basically you could say, the signal behavior can be seen as synchronous.

This is the perfect semantic for what I want to achieve.

However, as far as I'm concerned, the signal I configured is generally rather asynchronous and some time may pass until it is eventually delivered and the signal handler is executed. This may pose a problem for me.

For example, consider the following scenario:

  1. 1000 instructions have been executed.
  2. Event is automatically disabled and a signal is sent.
  3. Some more instructions pass
  4. Signal is delivered, control flow of the process is stopped and the signal handler is executed.

This scenario would mean two things:

  • The final amount of counted instructions would be less than an example which does not use signals at all.
  • The instruction pointer which has been saved for the signal handler would point to the instructions which caused the overflow or to any one after it.

So far, I've tested above example a lot and did not experience missed instructions which would support the first scenario.

However, I'd really like to know, whether I can rely on this assumption or not. What happens in the kernel?

like image 485
Dawodo Avatar asked Jun 29 '14 08:06

Dawodo


1 Answers

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed.

You have two task which may conflict with each other. When you want to get counting (exact amounts of some hardware event), just use performance monitoring unit of your CPU in counting mode (don't set sample_period/sample_freq of perf_event_attr structure used) and place the measurement code in your target program (as it was done in your example). In this mode according to the man page of perf_event_open no overflows will be generated (CPU's PMU are usually 64-bit wide and don't overflow when not set to small negative value when sampling mode is used):

Overflows are generated only by sampling events (sample_period must a nonzero value).

To count part of program, use ioctls of perf_event_open returned fd as described in man page

perf_event ioctl calls - Various ioctls act on perf_event_open() file descriptors: PERF_EVENT_IOC_ENABLE ... PERF_EVENT_IOC_DISABLE ... PERF_EVENT_IOC_RESET

You can read current value with rdpmc (on x86) or by read syscall on the fd like in the short example from the man page:

   #include <stdlib.h>
   #include <stdio.h>
   #include <unistd.h>
   #include <string.h>
   #include <sys/ioctl.h>
   #include <linux/perf_event.h>
   #include <asm/unistd.h>

   static long
   perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
                   int cpu, int group_fd, unsigned long flags)
   {
       int ret;

       ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
                      group_fd, flags);
       return ret;
   }

   int
   main(int argc, char **argv)
   {
       struct perf_event_attr pe;
       long long count;
       int fd;

       memset(&pe, 0, sizeof(struct perf_event_attr));
       pe.type = PERF_TYPE_HARDWARE;
       pe.size = sizeof(struct perf_event_attr);
       pe.config = PERF_COUNT_HW_INSTRUCTIONS;
       pe.disabled = 1;
       pe.exclude_kernel = 1;
       pe.exclude_hv = 1;

       fd = perf_event_open(&pe, 0, -1, -1, 0);
       if (fd == -1) {
          fprintf(stderr, "Error opening leader %llx\n", pe.config);
          exit(EXIT_FAILURE);
       }

       ioctl(fd, PERF_EVENT_IOC_RESET, 0);
       ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

       printf("Measuring instruction count for this printf\n");
       /* Place target code here instead of printf */

       ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
       read(fd, &count, sizeof(long long));

       printf("Used %lld instructions\n", count);

       close(fd);
   }

Additionally, I want to receive a Signal after a specific amount of instructions passed.

Do you really want to get signal or you just need instruction pointers at every 1000 instructions executed? If you want to collect pointers, use perf_even_open with sampling mode, but do it from other program to disable measuring of the event collection code. Also, it will have less negative effect on your target program, if you will use not signals for every overflow (with huge amount of kernel-tracer interactions and switching from/to kernel), but instead use capabilities of perf_events to collect several overflow events into single mmap buffer and poll on this buffer. On overflow interrupt from PMU perf interrupt handler will be called to save the instruction pointer into buffer and then counting will be reset and program will return to execution. In your example, perf interrupt handler will woke your program, it will do several syscalls, return to kernel and then kernel will restart target code (so overhead per sample is greater than using mmap and parsing it). With precise_ip flag you may activate advanced sampling of your PMU (if it has such mode, like PEBS and PREC_DIST in intel x86/em64t for some counters like INST_RETIRED, UOPS_RETIRED, BR_INST_RETIRED, BR_MISP_RETIRED, MEM_UOPS_RETIRED, MEM_LOAD_UOPS_RETIRED, MEM_LOAD_UOPS_LLC_HIT_RETIRED and with simple hack to cycles too; or like IBS of AMD x86/amd64; paper about PEBS and IBS), when instruction address is saved directly by hardware with low skid. Some very advanced PMUs has ability to do sampling in hardware, storing overflow information of several events in row with automatic reset of counter without software interrupts (some descriptions on precise_ip are in the same paper).

I don't know if it is possible in perf_events subsystem and in your CPU to have two perf_event tasks active at same time: both count events in the target process and in the same time have sampling from other process. With advanced PMU this can be possible in the hardware and perf_events in modern kernel may allow it. But you give no details on your kernel version and your CPU vendor and family, so we can't answer this part.

You also may try other APIs to access PMU like PAPI or likwid (https://github.com/RRZE-HPC/likwid). Some of them may directly read PMU registers (sometimes MSR) and may allow sampling at the same time when counting is enabled.

like image 55
osgx Avatar answered Nov 07 '22 21:11

osgx