Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

X86: How to set lower half of xmm0 to 0, without affecting the upper half?

Bottleneck when using indexed addressing modes

Loading an xmm from GP regs

68000 Assembly – Build a String from Characters *not* Present in Another & Return Its Length (stack-passed params)

Access of struct member faster if located <128 bytes from start?

Does the llvm-bolt instrumentation mode result in less accurate BOLT profiles?

How do you reason about fluctuations in benchmarking data?

Fastest way to set highest order bit of rax register to lowest order bit in rdx register

Optimized 53->32 bit modulo computation on 32-bit processors

Set an XMM register to a repeating byte pattern (broadcast a constant byte)

Performance / Space implications when ordering SQL Server columns?

Using the operand-size override prefix 0x66 for instruction alignment

Assembly function address table and data under the function or in data section

Fastest way to set a single memory cell to zero or a constant in x86 assembly?

How to exchange between 2 bits in a 1-byte number

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Can the compiler/JIT optimize away short-circuit evaluation if there are no side-effects?

Understanding a specific CIL / CLR optimization

Fastest way to take the average of two signed integers in x86 assembly?

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?