Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Is it okay to mix legacy SSE encoded instructions and VEX encoded ones in the same code path?

assembly x86 sse avx intel

Is it possible to use SIMD instructions in Rust?

rust simd avx avx2

When using a mask register with AVX-512 load and stores, is a fault raised for invalid accesses to masked out elements?

x86 avx avx512

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

Using AVX with GCC - avxintrin.h missing

c++ gcc avx

AVX/SSE version of xorshift128+

c performance sse avx

L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes

c caching memory x86 avx

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

Bitwise xor of two 256-bit integers

sse simd avx

Fastest Implementation of Exponential Function Using AVX

x86 simd avx exponential avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

Why is GCC's AVX slower while LLVM's faster?

gcc assembly llvm julia avx

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

Disabling AVX2 in CPU for testing purposes

Does the Linux kernel have its own SSE/AVX context?

Fastest way to expand bits in a field to all (overlapping + adjacent) set bits in a mask?

c assembly x86 sse avx

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2