Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

When using a mask register with AVX-512 load and stores, is a fault raised for invalid accesses to masked out elements?

x86 avx avx512

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

Using AVX with GCC - avxintrin.h missing

c++ gcc avx

AVX/SSE version of xorshift128+

c performance sse avx

L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes

c caching memory x86 avx

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

Bitwise xor of two 256-bit integers

sse simd avx

Fastest Implementation of Exponential Function Using AVX

x86 simd avx exponential avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

Why is GCC's AVX slower while LLVM's faster?

gcc assembly llvm julia avx

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

Disabling AVX2 in CPU for testing purposes

Does the Linux kernel have its own SSE/AVX context?

Fastest way to expand bits in a field to all (overlapping + adjacent) set bits in a mask?

c assembly x86 sse avx

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2

Horizontal minimum and maximum using SSE

c++ max sse minimum avx

Using SIMD on amd64, when is it better to use more instructions vs. loading from memory?

Half-precision floating-point arithmetic on Intel chips