Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors?
No problem; a Sandy Bridge processor will work seamlessly. Intel has made it possible for users to effectively use both Sandy Bridge and Ivy Bridge interchangeably, which is one of the main reasons why many are loving Intel's newest creation (aside from the fact that Ivy Bridge is slightly better all around).
Sandy Bridge is a microprocessor architecture developed by Intel Corporation and released after the Nehalem processor series. The Sandy Bridge processor incorporates the second generation of Intel Core processors.
Ivy Bridge is Intel's third-generation Core processor series.
Kaby Lake: successor to Skylake, released in August 2016, broke Intel's Tick-Tock schedule due to delays with the 10 nm process.
Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/
BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf
3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge
BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.
3.8 Branch prediction in Intel Haswell, Broadwell and Skylake
BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.
Intel may describe some data in "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 Branch Prediction Optimization" but still no sizes.
It may looks strange, but there were no information about BTB in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel workers...
* This table describes the possible cache and TLB configurations * as documented by Intel. For now AMD doesn't use this but gives * exact cache layout data on CPUID 0x8000000x. * * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information * to 80 bytes (of which 16 bytes are used in generic Pentii2). * With 80 possible caches we are on the safe side for one or two years. * * Strange enough no BHT, BTB or return stack data is given this way...
There should be some Performance monitoring unit (PMU) counters for BTB, and there are experiments to get BTB size from running special test programs, check http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt
Conclusions
From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.
For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.
and more his posts about branch prediction and its events:
His code is public, based on Agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With