Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Forcing AVX intrinsics to use SSE instructions instead

difference between load1 and broadcast intrinsics

x86 sse simd intrinsics intel

Extracting SSE shuffled 32 bit value with only SSE2

c optimization sse

SSE and AVX intrinsics mixture

c++ performance sse simd avx

How does endianness work with SIMD registers?

x86 sse endianness simd

Is there a more direct method to convert float to int with rounding than adding 0.5f and converting with truncation?

transpose for 8 registers of 16-bit elements on SSE2/SSSE3

assembly matrix x86 sse simd

How to convert a hex float to a float in C/C++ using _mm_extract_ps SSE GCC instrinc function

c++ gcc floating-point hex sse

Cannot use SSSE3 on enabled cpu

c linux ubuntu intel sse

Segmentation fault while working with SSE intrinsics due to incorrect memory alignment

c memory sse icc

Why is permute needed in parallel SIMD/SSE/AVX ?

permutation sse simd avx

Extract set bytes position from SIMD vector

c++ sse simd intrinsics

Fastest way to horizontally sum SSE unsigned byte vector

c++ x86 sse simd

Shifting 4 integers right by different values SIMD

c++ x86 sse simd avx

How to vectorize range check during block copy?

c++ vectorization sse avx

What does SSE instructions optimize in practice, and how does the compiler enables and use them?

c++ c assembly sse

64 bit features in a 32 bit application?

How to load two sets of 4 shorts into an XMM register?

c++ x86 sse simd intrinsics

Accumulate vector of integer with sse

c++ vector x86 sse simd

Are there unsigned equivalents of the x87 FILD and SSE CVTSI2SD instructions?