Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in intrinsics

Multiply 64-bit integers using .NET Core's hardware intrinsics

Different intrinsics behaviour depending on GCC version

What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

sse simd intrinsics

Horizontal add with __m512 (AVX512)

simd intrinsics avx512

Divide 8-bit integers by 4 (or shift) using SSE

c++ x86 sse simd intrinsics

GCC (in any version) equivalent of clang's __type_pack_element to get Nth element of template parameter pack

How to convert scalar code of the double version of VDT's Pade Exp fast_ex() approx into SSE2?

c++ sse intrinsics sse2 exp

Converting between SSE and NEON Intrinsics-Shuffling

sse shuffle neon intrinsics

What is meant by "fixing up" floats?

simd intrinsics avx512

How to efficiently perform int8/int64 conversion with SSE?

c++ x86 sse simd intrinsics

Meaning of suffix "x" in intrinsics like "_mm256_set1_epi64x"

What is the floating-point (__m256d) version of the non-temporal streaming load intrinsic (_mm256_stream_load_si256)?

c++ x86 simd intrinsics avx2

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

c x86 simd intrinsics sse2

AVX2 sparse matrix multiplication

difference between load1 and broadcast intrinsics

x86 sse simd intrinsics intel

Missing AVX-512 intrinsics for masks?

c gcc intrinsics icc avx512

How to optimize a cycle?

Extract set bytes position from SIMD vector

c++ sse simd intrinsics

How to load two sets of 4 shorts into an XMM register?

c++ x86 sse simd intrinsics

How can I access SHA intrinsic?

c hash sha intrinsics