Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

How can I do efficiently bitwise majority voting on 3, 5, 7, 9 inputs with SSE/SSE2/AVX/...?

assembly sse avx neon avx512

avx three operands for sqrt?

Convention for displaying vector registers

x86 sse simd avx

How to further optimize matrix multiplication in llm.c project?

SIMD: Bit-pack signed integers

sse simd avx avx2 avx512

Logical shift between YMM registers

assembly x86-64 avx avx2 avx512

Code alignment in one object file is affecting the performance of a function in another object file

c assembly x86 nasm avx

Rust target-cpu=native gets slower SIMD execution

rust simd intrinsics avx

Accumulating Doubles Into Bins via intrinsics

c++ simd avx avx2

AVX2: Is there a way to implement _mm256_mul_epi8 function for a constant power of 2?

c++ simd intrinsics avx avx2

SIMD unpack 12-bit fields to 16-bit

First use of AVX 256-bit vectors slows down 128-bit vector and AVX scalar ops

assembly x86-64 sse simd avx

Aligning memory on 16-byte and 32-byte boundaries

memory alignment sse simd avx

Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?

c++ simd sse avx avx2

performance of SSE and AVX when both Memory-band width limited

performance caching sse avx

What happens when you execute an instruction that your CPU does not support?

linux x86-64 avx

Memory argument of VMOVDQU partially out of allocated range

How to convert int 64 to int 32 with avx (but without avx-512)

simd sse avx

AVX2 integer comparison for smaller equal

c integer compare avx avx2

Macro for generating immediates for AVX shuffle intrinsics

c macros intel intrinsics avx