Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

First use of AVX 256-bit vectors slows down 128-bit vector and AVX scalar ops

assembly x86-64 sse simd avx

Aligning memory on 16-byte and 32-byte boundaries

memory alignment sse simd avx

Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?

c++ simd sse avx avx2

performance of SSE and AVX when both Memory-band width limited

performance caching sse avx

What happens when you execute an instruction that your CPU does not support?

linux x86-64 avx

Memory argument of VMOVDQU partially out of allocated range

How to convert int 64 to int 32 with avx (but without avx-512)

simd sse avx

AVX2 integer comparison for smaller equal

c integer compare avx avx2

Macro for generating immediates for AVX shuffle intrinsics

c macros intel intrinsics avx

optimising column-wise maximum with SIMD

c++ sse simd intrinsics avx

Find Absolute in AVX

Force compiler to use memory operand from Intrinsics

c memory intrinsics avx operands

AVX-512 Instruction Encoding - {er} Meaning

assembly x86 avx avx512

Improving a recursive hadamard transformation

c simd avx

No insert and extract for float/double in SSE and AVX?

c++ floating-point sse simd avx

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?

Why gcc is so much worse at std::vector<float> vectorization of a conditional multiply than clang?

Vectorization of modulo multiplication

c++ algorithm sse simd avx

How to run bitwise OR on big vectors of u64 in the most performant manner?

c++ performance assembly cpu avx

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?