Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Using a variable to index a simd vector with _mm256_extract_epi32() intrinsic

simd intrinsics avx avx2

Why do processors with only AVX out-perform AVX2 processors for many SIMD algorithms?

c# c++ simd avx avx2

Does /arch:AVX enable AVX2?

Best way to load/store from/to general purpose registers to/from xmm/ymm register

assembly x86 simd sse2 avx2

Fully utilizing pipelines on kaby lake

How to concatenate two vector efficiently using AVX2? (a lane-crossing version of VPALIGNR)

c simd intrinsics avx avx2

Counting 1 bits (population count) on large data using AVX-512 or AVX-2

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

x86 sse simd avx avx2

Efficient way of rotating a byte inside an AVX register

c sse simd avx avx2

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

Optimal SIMD algorithm to rotate or transpose an array

Fast modulo-12 algorithm for 4 uint16_t's packed in a uint64_t

What do you do without fast gather and scatter in AVX2 instructions?

How to implement an efficient _mm256_madd_epi8?

c++ x86 simd intrinsics avx2

Efficient implementation of log2(__m256d) in AVX2

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

How can I add together two SSE registers

c++ c intel sse avx2