Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

GCC emits vastly different code using "-march=native" on similar architectures

c gcc assembly sse avx

How to quickly count bits into separate bins in a series of ints on Sandy Bridge? [duplicate]

c++ assembly x86 simd avx

Scatter intrinsics in AVX

intrinsics avx avx2

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all

gcc assembly x86 sse avx

RyuJIT not making full use of SIMD intrinsics

c# sse simd avx ryujit

Unaligned load versus unaligned store

When the compiler reorders AVX instructions on Sandy, does it affect performance?

Is it worth bothering to align AVX-256 memory stores?

Why do SSE instructions preserve the upper 128-bit of the YMM registers?

performance x86 avx

Is NOT missing from SSE, AVX?

How to solve the 32-byte-alignment issue for AVX load/store operations?

Transpose an 8x8 float using AVX/AVX2

simd avx avx2

How to find the horizontal maximum in a 256-bit AVX vector

AVX VMOVDQA slower than two SSE MOVDQA?

How to sum __m256 horizontally?

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

c++ x86 intel sse avx

Does ICC satisfy C99 specs for multiplication of complex numbers?

How to rotate an SSE/AVX vector

c x86 sse intrinsics avx

Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record

linux linker gdb glibc avx

Choice between aligned vs. unaligned x86 SIMD instructions

x86 sse simd avx avx512