Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

SIMD SSE2 __m128i contains 4 int32_t how to quickly find each integer that bigger or small than 0

c x86 sse simd sse2

Initialize __m256i from 64 high or low bits of four __m128i variables

c++ sse simd avx avx2

On uint64 to double conversion: Why is the code simpler after a shift right by 1?

What is the point of MOVAPS in x86 if it does the same as MOVUPS in modern computers?

assembly x86 sse

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

Array vs pointer auto-vectorization in gcc

c++ gcc sse auto-vectorization

C++: How to prevent default constructor using AVX for initialisation

c++ constructor x86 sse avx

SSE4.1 slower than SSE3 on 4x4 matrix multiplication?

c++ matrix simd sse matmul

Load two 64-bit integers into lower & upper xmm, respectively

assembly sse cpu-registers

Using C union with SSE intrinsics in Cython results in SIGSEGV

python c cython sse

Efficiently Set Lowest 64 Bits of YMM Register to Constant

Add uchar values in ushort array with SSE or SSE3

Twice as slow SIMD performance without extra copy

SSE - Non-Existant haddsub intrinsic?

sse simd intrinsics

SSE: How to reduce a _m128i._i32[4] to _m128i._i8

c++ x86 sse simd

Is there a way to increase a value in a xmm register?

assembly x86 addition sse

SSE optimisation for a loop that finds zeros in an array and toggles a flag + updates another array

c++ optimization x86 sse simd

What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?

intel sse intrinsics sse2 mmx