Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

Scatter intrinsics in AVX

intrinsics avx avx2

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all

gcc assembly x86 sse avx

RyuJIT not making full use of SIMD intrinsics

c# sse simd avx ryujit

Unaligned load versus unaligned store

When the compiler reorders AVX instructions on Sandy, does it affect performance?

Is it worth bothering to align AVX-256 memory stores?

Why do SSE instructions preserve the upper 128-bit of the YMM registers?

performance x86 avx

Is NOT missing from SSE, AVX?

How to solve the 32-byte-alignment issue for AVX load/store operations?

Transpose an 8x8 float using AVX/AVX2

simd avx avx2

How to find the horizontal maximum in a 256-bit AVX vector

AVX VMOVDQA slower than two SSE MOVDQA?

How to sum __m256 horizontally?

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

c++ x86 intel sse avx

Does ICC satisfy C99 specs for multiplication of complex numbers?

How to rotate an SSE/AVX vector

c x86 sse intrinsics avx

Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record

linux linker gdb glibc avx

Choice between aligned vs. unaligned x86 SIMD instructions

x86 sse simd avx avx512

How to use the Intel AVX in Java?

java simd avx

How are the gather instructions in AVX2 implemented?

intel ram simd avx avx2