Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

Is performance reduced when executing loops whose uop count is not a multiple of processor width?

Why date() works twice as fast if we set time zone from code?

Why does n++ execute faster than n=n+1?

Why does breaking the "output dependency" of LZCNT matter?

'Correct' unsigned integer comparison

Why are loops always compiled into "do...while" style (tail jump)?

Go: multiple len() calls vs performance?

x86_64 best way to reduce 64 bit register to 32 bit retaining zero or non-zero status

Can x86's MOV really be "free"? Why can't I reproduce this at all?

x > -1 vs x >= 0, is there a performance difference

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

Avoiding the overhead of C# virtual calls

fastest way to negate a number

Passing null pointer to placement new

Does calculating Sqrt(x) as x * InvSqrt(x) make any sense in the Doom 3 BFG code?

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

Why does Intel's compiler prefer NEG+ADD over SUB?

Comparing two values in the form (a + sqrt(b)) as fast as possible?

INC instruction vs ADD 1: Does it matter?

Do java finals help the compiler create more efficient bytecode? [duplicate]