Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Common SIMD techniques

arm sse simd neon mmx

_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

c x86 intel sse simd

GCC SSE code optimization

Push XMM register to the stack

assembly x86 simd sse

Is it possible to cast floats directly to __m128 if they are 16 byte aligned?

c++ c alignment sse intrinsics

Is NOT missing from SSE, AVX?

How to solve the 32-byte-alignment issue for AVX load/store operations?

How to absolute 2 double or 4 floats using SSE instruction set? (Up to SSE4)

gcc sse

AVX VMOVDQA slower than two SSE MOVDQA?

adding the components of an SSE register

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

How to sum __m256 horizontally?

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

c++ x86 intel sse avx

Is it fair to compare SSE/AVX units to GPU cores?

cuda hardware opencl gpu sse

Fastest way to compute absolute value using SSE

Can't get over 50% max. theoretical performance on matrix multiply

c optimization matrix openmp sse

SSE 4 instructions generated by Visual Studio 2013 Update 2 and Update 3

How to rotate an SSE/AVX vector

c x86 sse intrinsics avx

Why do some SSE "mov" instructions specify that they move floating-point values?

assembly x86 sse

How to implement "_mm_storeu_epi64" without aliasing problems?