Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

Why are bitwise operators slower than multiplication/division/modulo?

Is thread time spent in synchronization too high?

Does calling the constructor of an empty class actually use any memory?

Faster implementation of Math.round?

Java: micro-optimizing array manipulation

Check the existence of a HashMap key

Extreme optimization of integer binary search

Why is `arr.take(idx)` faster than `arr[idx]`

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

What are the costs of failed store-to-load forwarding on x86?

What's the most efficient way to make bitwise operations in a C array

SSE micro-optimization instruction order

AND faster than integer modulo operation?

LINQ Count() until, is this more efficient?

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Why does declaring a counter variable outside of a nested function make a loop 5x slower?

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

Inlining of a recursive function