Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in intrinsics

Setting last or first n bits in SSE register

c++ x86 sse simd intrinsics

Compress mask using AVX intrinsics

c x86 simd intrinsics avx

Translating SSE to Neon: How to pack and then extract 32bit result

c++ arm sse neon intrinsics

AVX/SSE round floats down and return vector of ints?

c++ intel sse intrinsics avx

Shuffle AVX 256 Vector elements by 1 position left/right - C intrinsics

c sse hpc intrinsics avx

Benefits of using clang builtins vs standard functions

c++ gcc clang intrinsics

What is the inverse of "_mm256_cvtepi16_epi32"

x86 g++ intrinsics avx avx2

Intel SIMD - How can I check if an __m256* contains any non-zero values

c++ simd intrinsics avx

Are there ARM intrinsics for add-with-carry in C?

arm intrinsics

What does "vperm v0,v0,v0,v17" with unused v0 do?

c++ gcc sha intrinsics powerpc

What is the difference between loadu_ps and set_ps when using unformatted data?

sse simd intrinsics sse2

How do I broadcast the lowest word of a __m256i?

intrinsics avx2

What is the difference between _mm_movehdup_ps and _mm_shuffle_ps in this case?

How to emulate _mm256_loadu_epi32 with gcc or clang?

c++ c intrinsics avx512

c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?

c++ intel intrinsics avx2 avx512

How to improve performance of following loop

Why should you not access the __m128i fields directly?

c++ sse intrinsics

Issues with intel intrinsics

c intel intrinsics

How to increment a vector in AVX/AVX2

AVX 4-bit integers