BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

1 Answers

Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/

BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf

3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge

BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.

3.8 Branch prediction in Intel Haswell, Broadwell and Skylake

BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.

Intel may describe some data in "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 Branch Prediction Optimization" but still no sizes.

It may looks strange, but there were no information about BTB in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel workers...

 * This table describes the possible cache and TLB configurations
 * as documented by Intel. For now AMD doesn't use this but gives
 * exact cache layout data on CPUID 0x8000000x.
 *
 * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information
 * to 80 bytes (of which 16 bytes are used in generic Pentii2).
 * With 80 possible caches we are on the safe side for one or two years.
 *
 * Strange enough no BHT, BTB or return stack data is given this way...

There should be some Performance monitoring unit (PMU) counters for BTB, and there are experiments to get BTB size from running special test programs, check http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt

Conclusions

From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.

For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.

and more his posts about branch prediction and its events:

http://xania.org/201602/bpu-part-one Static branch prediction on newer Intel processors
http://xania.org/201602/bpu-part-two Branch prediction - part two
http://xania.org/201602/bpu-part-three The BTB in contemporary Intel chips)
http://xania.org/201602/bpu-part-four Branch Target Buffer, part 2

His code is public, based on Agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py

answered Oct 20 '22 22:10

osgx

Related questions
                            
                                Using OR r/m32, imm32 in NASM
                            
                                An i386/x64 pop FS/GS instruction supports a variant where it increments the SP by 32 bits or 64 bits in stead of 16 bits. What is it used for?
                            
                                Is the i386 instruction "div ah" pointless?
                            
                                How to jump to / call arbitrary memory in Rust
                            
                                x86 emulator for training embedded development
                            
                                Assembly, hello world question
                            
                                Intel pin: Instrumentate running process
                            
                                How do I enter 32-bit protected mode in NASM assembly?
                            
                                What will be the exact code to get count of last level cache misses on Intel Kaby Lake architecture
                            
                                Define byte appearing in debug after a manually encoded far call
                            
                                Code alignment dramatically affects performance
                            
                                Windows IDE for Intel x86 Assembler? [closed]
                            
                                What is there to a thread beside a stack
                            
                                I'm writing my own JIT-interpreter. How do I execute generated instructions?
                            
                                Identifying faulting address on General Protection Fault (x86)
                            
                                Is it possible for evolutionary algorithms to create machine code? [closed]
                            
                                mov %eax,(%esp)
                            
                                Why test port 0x64 in a bootloader before switching into protected mode?
                            
                                What happens with a processor when it tries to access a nonexistent physical address?
                            
                                Loading non contiguous values with Intel SIMD SSE

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

Tags:

cpu-architecture

branch-prediction

x86

cpu

intel

samira

People also ask

1 Answers

osgx

Recent Activity

Donate For Us