Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

Horizontal minimum and maximum using SSE

c++ max sse minimum avx

Using SIMD on amd64, when is it better to use more instructions vs. loading from memory?

Half-precision floating-point arithmetic on Intel chips

Unexpectedly good performance with openmp parallel for loop

Aligned and unaligned memory access with AVX/AVX2 intrinsics

gcc avx avx2

Efficiently find least significant set bit in a large array?

Difference between the AVX instructions vxorpd and vpxor

vectorization intel xor simd avx

Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

windows assembly sse avx avx512

Are older SIMD-versions available when using newer ones?

c++ c sse simd avx

How to get data out of AVX registers?

c++ visual-c++ avx fma

How to clear the upper 128 bits of __m256 value?

c x86 simd avx avx2

Generate code for multiple SIMD architectures

gcc simd avx sse4

Find index of maximum element in x86 SIMD vector

c++ x86 sse simd avx intel

practical BigNum AVX/SSE possible?

Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd?

ASM x86_64 AVX: xmm and ymm registers differences

assembly nasm x86-64 avx

Get index of first element that is not zero in a __m256 variable

c++ c sse simd avx

What's the point of the VPERMILPS instruction (_mm_permute_ps)?

Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision

performance sse simd avx

Using __m256d registers

c++ x86 intel simd avx