Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Optimize extraction of 64 bit value from AVX2 register

c sse avx avx2

Can a movss instruction be used to replace integer data?

c++ assembly vector sse

Floating-point number vs fixed-point number: speed on Intel I5 CPU

What is the difference between loadu_ps and set_ps when using unformatted data?

sse simd intrinsics sse2

Get an arbitrary float from a simd register at runtime?

x86 sse simd avx avx2

What is the difference between _mm_movehdup_ps and _mm_shuffle_ps in this case?

Why using AVX ymm(m256) instructions is ~4 times slower than xmm(m128)

What is the point of SSE2 instructions such as orpd?

How much effort do you have to put in to get gains from using SSE?

c++ sse

where is _mm_prefetch in Visual Studio 2012?

c++ sse prefetch

How to improve performance of following loop

Clear upper bytes of __m128i

Why should you not access the __m128i fields directly?

c++ sse intrinsics

C/C++: -msse and -msse2 Flags do not have any effect on the binaries?

c++ gcc sse sse2

How to truncate float values in XMM register

c++ c assembly sse

AVX2 float compare and get 0.0 or 1.0 instead of all-0 or all-one bits

c++ sse simd avx avx2

How to move a floating-point constant value into an xmm register?

assembly x86 sse

Multiply-add vectorization slower with AVX than with SSE

SSE (SIMD extensions) support in gcc

gcc sse simd

How do you get maximal speed out of SSE?