Logo Questions Linux Laravel Mysql Ubuntu Git Menu

New posts in micro-optimization

repz ret: why all the hassle?

Why _umul128 works slower than scalar code for mul128x64x2 function?

How to implement the totalOrder predicate for floating point values?

ARM Cortex M0+: How to use "Branch if Carry" instructions in C-code?

can array access be optimized?

x86 Assembly pushad/popad, How fast it is?

Optimize nested loops for pattern-filling an array, to help the compiler produce efficient ARM assembly?

Why this unnecessary MOVAPD copy in gcc 9.1, in a tiny function

x86 opcode alignment references and guidelines

JVM first 4 booleans optimized, not 5th

Is there a difference in performance between the child and descendant selectors?

According to Intel my cache should be 24-way associative though its 12-way, how is that?

Can two instructions execute in the same cycle if the same register is used as input and output respectively?

Instruction reordering in x86 / x64 asm - performance optimisation with latest CPUs

Is there some benefit in the following assembly commands?

Does optimizing code in TI-BASIC actually make a difference?

Conflicting signs in x86 assembly: movsx then unsigned compare/branch?

str_replace with strpos?

Are compilers able to avoid branching instructions?

How to get lg2 of a number that is 2^k