Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

compiling AVX2 program

c gcc avx avx2

How to divide a __m256i vector by an integer variable?

optimization x86 simd avx avx2

What is the fastest way to count the number of nonzero entries in an __mm256 vector?

algorithm vector simd avx avx2

Fastest way to set __m256 value to all ONE bits

How to implement lane crossing logical bit-wise shift/rotate (left and right) in AVX2

c++ c avx2

Convert signed short to float in C++ SIMD

c++ sse simd avx2

Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2

c intrinsics avx avx2 avx512

Is it really efficient to use Karatsuba algorithm in 64-bit x 64-bit multiplication?

Which is the reason for avx floating point bitwise logical operations?

c++ simd avx avx2

gdb reverse debugging avx2

c gdb glibc avx2

uint32_t * uint32_t = uint64_t vector multiplication with gcc

c gcc vectorization avx2 gcc9

Getting GCC to generate a PTEST instruction when using vector extensions

c gcc vectorization sse avx2

How to do _mm256_maskstore_epi8() in C/C++?

c++ simd intrinsics avx avx2

AVX2 byte gather with uint16 indices, into a __m256i

c intrinsics avx pack avx2

Efficient (on Ryzen) way to extract the odd elements of a __m256 into a __m128?

What is the floating-point (__m256d) version of the non-temporal streaming load intrinsic (_mm256_stream_load_si256)?

c++ x86 simd intrinsics avx2

Find the first instance of a character using simd

x86 sse simd avx avx2

AVX2 instructions latency and throughput

performance x86 x86-64 simd avx2

Intel IACA analyzer alters assembly?

assembly simd avx2 iaca