Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx512

Counting 1 bits (population count) on large data using AVX-512 or AVX-2

AVX-512 and Branching

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

What are the AVX-512 Galois-field-related instructions for?

avx512 galois-field

What are the differences between the compress and expand instructions in AVX-512?

assembly x86 simd avx512

GNU C inline asm input constraint for AVX512 mask registers (k1...k7)?

Do 128bit cross lane operations in AVX512 give better performance?

performance x86 intel avx avx512

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

Does vzeroall zero registers ymm16 to ymm31?

assembly x86 intel avx avx512

What is the penalty of mixing EVEX and VEX encoded scheme?

assembly x86 simd avx512

Truth-table reduction to ternary logic operations, vpternlog

What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?

How to convert a binary integer number to a hex string?

assembly x86 hex simd avx512

Fallback implementation for conflict detection in AVX2

c++ x86 intrinsics avx2 avx512

When using a mask register with AVX-512 load and stores, is a fault raised for invalid accesses to masked out elements?

x86 avx avx512

How do the Conflict Detection instructions make it easier to vectorize loops?

Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

windows assembly sse avx avx512

How to transpose a 16x16 matrix using SIMD instructions?