Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in simd

Scaling byte pixel values (y=ax+b) with SSE2 (as floats)?

c++ visual-studio x86 simd sse2

When should I use DO CONCURRENT and when OpenMP?

How to efficiently perform int8/int64 conversion with SSE?

c++ x86 sse simd intrinsics

Meaning of suffix "x" in intrinsics like "_mm256_set1_epi64x"

How to optimise this 8-bit positional popcount using assembly?

go assembly x86 simd avx

No speedup when summing uint16 vs uint64 arrays with NumPy?

SSE SIMD Optimization For Loop

visual-c++ sse simd

OpenCL distribution

neon float multiplication is slower than expected

c++ gcc arm simd neon

implict SIMD (SSE/AVX) broadcasts with GCC

gcc sse simd avx

Fast SSE threshold algorithm

What is the floating-point (__m256d) version of the non-temporal streaming load intrinsic (_mm256_stream_load_si256)?

c++ x86 simd intrinsics avx2

How to speed up calculation of integral image?

best way to shuffle across AVX lanes?

c++ x86 sse simd avx

GEMM kernel implemented using AVX2 is faster than AVX2/FMA on a Zen 2 CPU

SIMD C++ library

c++ gcc simd

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

c x86 simd intrinsics sse2

For for an SSE vector that has all the same components, generate on the fly or precompute?

c++ sse simd avx

How to write c++ code that the compiler can efficiently compile to SSE or AVX?

Find the first instance of a character using simd

x86 sse simd avx avx2