Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Fully utilizing pipelines on kaby lake

How to concatenate two vector efficiently using AVX2? (a lane-crossing version of VPALIGNR)

c simd intrinsics avx avx2

Counting 1 bits (population count) on large data using AVX-512 or AVX-2

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

x86 sse simd avx avx2

Efficient way of rotating a byte inside an AVX register

c sse simd avx avx2

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

Optimal SIMD algorithm to rotate or transpose an array

Fast modulo-12 algorithm for 4 uint16_t's packed in a uint64_t

What do you do without fast gather and scatter in AVX2 instructions?

How to implement an efficient _mm256_madd_epi8?

c++ x86 simd intrinsics avx2

Efficient implementation of log2(__m256d) in AVX2

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

How can I add together two SSE registers

c++ c intel sse avx2

Efficient way to set first N or last N bits of __m256i to 1, the rest to 0

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Do all CPUs which support AVX2 also support SSE4.2 and AVX?

sse simd avx avx2

AVX2 slower than SSE on Haswell

c++ x86 sse simd avx2

Is this incorrect code generation with arrays of __m256 values a clang bug?