Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-architecture

How modern X86 processors actually compute multiplications?

How is the transitivity/cumulativity property of memory barriers implemented micro-architecturally?

Conditional jump instructions in MSROM procedures?

Why jnz requires 2 cycles to complete in an inner loop

What are the "long" and "short" scoreboards w.r.t. MIO/L1TEX?

Are load ops deallocated from the RS when they dispatch, complete or some other time?

How to tell length of an x86-64 instruction opcode using CPU itself?

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

How do the store buffer and Line Fill Buffer interact with each other?

How does the indexing of the Ice Lake's 48KiB L1 data cache work?

Any reason to use BX R over MOV pc, R except thumb interwork pre ARMv7?

how are barriers/fences and acquire, release semantics implemented microarchitecturally?

Adding a redundant assignment speeds up code when compiled without optimization

Return stack buffer?

Why isn't there a data bus which is as wide as the cache line size?

Does memory dependence speculation prevent BN_consttime_swap from being constant-time?