Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we measure successful store-forwarding with Intel's performance counters?

Is it possible to measure the number of successful store-forwarding operations using the performance counters on recent Intel x86 chips?

I see events for ld_blocks.store_forward which measure failed store-forwarding, but it's clear to me if the successful case can be measured.

like image 656
BeeOnRope Avatar asked Sep 09 '17 22:09

BeeOnRope


2 Answers

I don't see anything more than you did for SKL, but older uarches may have more details:

For Core2 (what Intel confusingly calls the Core microarchitecture), the optimization manual documents (in B.7 EVENT RATIOS FOR INTEL CORE MICROARCHITECTURE):

B.7.5.2 4K Aliasing and Store Forwarding Block Detection

  1. Loads Blocked by Overlapping Store Rate: LOAD_BLOCK.OVERLAP_STORE/CPU_CLK_UNHALTED.CORE

4K aliasing and store forwarding block are two different scenarios in which loads are blocked by preceding stores due to different reasons. Both scenarios are detected by the same event: LOAD_BLOCK.OVERLAP_STORE. A high value for “Loads Blocked by Overlapping Store Rate” indicates that either 4K aliasing or store forwarding block may affect performance

This may count stalled and successful store-forwarding. (And 4k aliasing, so you need to avoid that or subtract it.)

B.7.5.3 Load Block by Preceding Stores

  1. Loads Blocked by Unknown Store Address Rate: LOAD_BLOCK.STA / CPU_CLK_UNHALTED.CORE

A high value for “Loads Blocked by Unknown Store Address Rate” indicates that loads are frequently blocked by preceding stores with unknown address and implies performance penalty.

  1. Loads Blocked by Unknown Store Data Rate: LOAD_BLOCK.STD / CPU_CLK_UNHALTED.CORE

A high value for “Loads Blocked by Unknown Store Data Rate” indicates that loads are frequently blocked by preceding stores with unknown data and implies performance penalty.

These last two counters would appear to count successful store forwarding, but only in cases where the load actually had to wait after detecting the (possible) overlap.

like image 116
Peter Cordes Avatar answered Oct 11 '22 20:10

Peter Cordes


There is no documented event to count the number of successful store forwarding operations. However, I have experimentally determined a set of undocumented events for that purpose on Haswell and Broadwell. In particular, any event with event code 0x2 and an odd value for umask (any odd number such as 1) seems to be representing the event of successful store forwarding very accurately, i.e., the counts are as expected and the standard deviation is practically zero. I think you can use the same events on later (and even earlier) microarchitectures. Again, none of these events are documented.

like image 30
Hadi Brais Avatar answered Oct 11 '22 20:10

Hadi Brais