Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

Assembly function address table and data under the function or in data section

Fastest way to set a single memory cell to zero or a constant in x86 assembly?

How to exchange between 2 bits in a 1-byte number

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Can the compiler/JIT optimize away short-circuit evaluation if there are no side-effects?

Understanding a specific CIL / CLR optimization

Fastest way to take the average of two signed integers in x86 assembly?

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?

Is the fall-through side of a conditional branch more efficient? Is it a good idea to make that the error-handling side?

Efficient UTF-8 character-length decoding for a non-zero character in a 32 bit register

Advantage of using LEA over MOV for passing parameters in Assembly compiled from C++

Is there a faster algorithm for max(ctz(x), ctz(y))?

repz ret: why all the hassle?

Why _umul128 works slower than scalar code for mul128x64x2 function?

How to implement the totalOrder predicate for floating point values?

ARM Cortex M0+: How to use "Branch if Carry" instructions in C-code?

can array access be optimized?

How to get lg2 of a number that is 2^k

Why is my operator ++ more than twice as fast as its equivalent instance method?

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?