Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

Preventing GCC from automatically using AVX and FMA instructions when compiled with -mavx and -mfma

c++ gcc vectorization avx fma

Large (0,1) matrix multiplication using bitwise AND and popcount instead of actual int or float multiplies?

How to align stack at 32 byte boundary in GCC?

gcc stack sse avx

How to force gcc to use all SSE (or AVX) registers?

Horizontal XOR in AVX

c++ assembly x86 simd avx

Do 128bit cross lane operations in AVX512 give better performance?

performance x86 intel avx avx512

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

Does vzeroall zero registers ymm16 to ymm31?

assembly x86 intel avx avx512

Is L2 HW prefetcher really helpful?

AVX log intrinsics (_mm256_log_ps) missing in g++-4.8?

c++ g++ intrinsics avx

How to efficiently combine comparisons in SSE?

c optimization assembly sse avx

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Do all CPUs which support AVX2 also support SSE4.2 and AVX?

sse simd avx avx2

SSE runs slow after using AVX

c++ gcc x86 avx sse2

Does Clang have something like #pragma GCC target?

clang intrinsics avx pragma

What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?

Packing and de-interleaving two __m256 registers

c++ x86 simd avx avx2

How to do an indirect load (gather-scatter) in AVX or SSE instructions?

c vector intel sse avx

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

intel simd avx avx2