I'd like to use hardware performance counter, specifically x86 CPUs to obtain cache misses or branch mis-prediction. Performance counters are heavily used in advanced profilers like Intel VTune. Please don't be confused performance counters on Windows operating systems.
In order to use these counters in C/C++ program, one may use PAPI: http://icl.cs.utk.edu/papi/
This allows you to easily use performance counters, but on only Linux. PAPI once supported Windows, but not now.
Is there anyone who recently tried PAPI or other APIs to use hardware performance counters on Windows?
You can view performance counters using the Microsoft Windows Reliability and Performance Monitor application. Click Start > Run. In the Open field, enter perfmon , and then click OK. From Monitoring Tools, select Performance Monitor.
Windows Performance Counters provide a high-level abstraction layer that provides a consistent interface for collecting various kinds of system data such as CPU, memory, and disk usage. System administrators often use performance counters to monitor systems for performance or behavior problems.
Performance counters are bits of code that monitor, count, or measure events in software, which allow us to see patterns from a high-level view. They are registered with the operating system during installation of the software, allowing anyone with the proper permissions to view them.
You can use RDPMC instruction or __readpmc MSVC compiler intrinsic, which is the same thing.
However, Windows prohibits user-mode applications to execute this instruction by setting CR4.PCE to 0. Presumably, this is done because the meaning of each counter is determined by MSR registers, which are only accessible in kernel mode. In other words, unless you're a kernel-mode module (e.g. a device driver), you are going to get "privileged instruction" trap if you attempt to execute this instruction.
If you're writing a user-mode application, your only option is (as @Christopher mentioned in comments) to write a kernel module which would execute this instruction for you (you'll incur user->kernel call penalty) and enable test signing on your machine so your presumably self-signed "driver" can be loaded. This means you can't easily distribute this app, but that'll work for in-house tuning.
What about this HCP Reference? Does it not provide what you want?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With