Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

How to compile a project which requires SSE2 on MacBook with M1 chip?

Why is SIMD slower than scalar counterpart

assembly x86 sse simd

CVTTSD2SI - a truncating instruction - uses rounding with "inexact" results?

How to store 4 32 bit floats into one 128 bit xmm register?

assembly x86 x86-64 sse simd

gcc vector extensions don't work as stated in docs

gcc sse vectorization

How to move (up to) 16 single bytes into an XMM register?

assembly x86 intel sse simd

No insert and extract for float/double in SSE and AVX?

c++ floating-point sse simd avx

Auto-vectorize shuffle instruction

c sse avx2 auto-vectorization

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?

Reading SSE registers (XMM, YMM) in a signal handler

Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?

assembly x86 sse sse2 x87

Extract scalar value from SSE vector

c x86 sse simd

Penalty for switching from SSE to AVX?

c++ sse avx sse2

Shifting a __m128i using _mm_slli_epi64

c sse

GCC access memory above stack top [duplicate]

assembly gcc x86-64 sse red-zone

SSE intrinsics: masking a float and using bitwise and?

c++ sse intrinsics

Vectorization of modulo multiplication

c++ algorithm sse simd avx

Does RSQRTSS break the dependency on the destination register?

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?

Call libmvec functions manually on __m128 vectors?

c simd sse glibc intrinsics