Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Moving 2 QWORDs from general purpose registers into an XMM register as high/low [duplicate]

assembly x86-64 masm sse

Fast way to set single bit in SSE datatypes (__m128i)?

c++ bit-manipulation intel sse

different results with and without SSE ( float arrays multiplication)

c++ arrays floating-point sse

C++ load and store optimizations and heap objects

c++ sse simd

AVX vs. SSE: expect to see a larger speedup

performance sse simd avx

Is there a way to mask one end of a __m128i register based on mask length that is not known at compile time?

sse simd avx

Using SSE to speed up lower_bound function

c assembly x86 x86-64 sse

How can I optimize conversion from half-precision float16 to single-precision float32?

why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

Detecting SIMD instruction sets to be used with C++ Macros in Visual Studio 2015

Efficient SSE NxN matrix multiplication

Non-temporal stores of portions of a packed double vector using SSE/AVX

caching x86 x86-64 sse avx

Optimization using prefetch

optimization assembly sse

What's the point of _mm_cmpgt_sd and other similar methods?

x86 sse simd intrinsics

What is the minimum version of OS X for use with AVX/AVX2?

macos sse avx avx2

How to set all elements in a __m256d to, say, the 3rd element of another __m256d?

sse avx

What is the difference between loadu and load?

assembly x86 sse simd intrinsics

SSE operation on 4 arrays of integer size

c assembly sse simd intrinsics

How do I extract a single byte from an xmm-register in Asm?