Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

x86 sse simd avx avx2

Efficient way of rotating a byte inside an AVX register

c sse simd avx avx2

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

Optimal SIMD algorithm to rotate or transpose an array

Fast modulo-12 algorithm for 4 uint16_t's packed in a uint64_t

What do you do without fast gather and scatter in AVX2 instructions?

How to implement an efficient _mm256_madd_epi8?

c++ x86 simd intrinsics avx2

Efficient implementation of log2(__m256d) in AVX2

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

How can I add together two SSE registers

c++ c intel sse avx2

Efficient way to set first N or last N bits of __m256i to 1, the rest to 0

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Do all CPUs which support AVX2 also support SSE4.2 and AVX?

sse simd avx avx2

AVX2 slower than SSE on Haswell

c++ x86 sse simd avx2

Is this incorrect code generation with arrays of __m256 values a clang bug?

Packing and de-interleaving two __m256 registers

c++ x86 simd avx avx2

Fallback implementation for conflict detection in AVX2

c++ x86 intrinsics avx2 avx512

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

intel simd avx avx2