Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Are older SIMD-versions available when using newer ones?

c++ c sse simd avx

Find index of maximum element in x86 SIMD vector

c++ x86 sse simd avx intel

practical BigNum AVX/SSE possible?

Is SSE floating-point arithmetic reproducible?

SIMD latency throughput

c++ performance x86 sse simd

Speed up float 5x5 matrix * vector multiplication with SSE

Flipping sign on packed SSE floats

Constexpr and SSE intrinsics

An SSE Stdlib-esque Library?

c++ c visual-c++ assembly sse

Best way to load a 64-bit integer to a double precision SSE2 register?

assembly double sse sse2 int64

Get index of first element that is not zero in a __m256 variable

c++ c sse simd avx

Does rewriting memcpy/memcmp/... with SIMD instructions make sense?

performance sse simd

Optimizing code using Intel SSE intrinsics for vectorization

c sse sse3 sse4

Intel Intrinsics guide - Latency and Throughput

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

x86 sse simd sse2 sse3

Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision

performance sse simd avx

Converting float vector to 16-bit int without saturating

c++ c performance sse

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

SIMD the following code

c x86 sse simd

parallel prefix (cumulative) sum with SSE

c sum openmp sse