Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes

c caching memory x86 avx

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

Bitwise xor of two 256-bit integers

sse simd avx

Fastest Implementation of Exponential Function Using AVX

x86 simd avx exponential avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

Why is GCC's AVX slower while LLVM's faster?

gcc assembly llvm julia avx

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

Disabling AVX2 in CPU for testing purposes

Does the Linux kernel have its own SSE/AVX context?

Fastest way to expand bits in a field to all (overlapping + adjacent) set bits in a mask?

c assembly x86 sse avx

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2

Horizontal minimum and maximum using SSE

c++ max sse minimum avx

Using SIMD on amd64, when is it better to use more instructions vs. loading from memory?

Half-precision floating-point arithmetic on Intel chips

Unexpectedly good performance with openmp parallel for loop

Aligned and unaligned memory access with AVX/AVX2 intrinsics

gcc avx avx2

Efficiently find least significant set bit in a large array?

Difference between the AVX instructions vxorpd and vpxor

vectorization intel xor simd avx

Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

windows assembly sse avx avx512