Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting started with Intel x86 SSE SIMD instructions

Tags:

c

x86

gcc

simd

sse

I want to learn more about using the SSE.

What ways are there to learn, besides the obvious reading the Intel® 64 and IA-32 Architectures Software Developer's Manuals?

Mainly I'm interested to work with the GCC X86 Built-in Functions.

like image 926
Liran Orevi Avatar asked Sep 07 '09 14:09

Liran Orevi


People also ask

Does X86 have SIMD?

Most microprocessors whether x86 or ARM based provide what we call SIMD instructions in the microprocessors. You may have heard of MMX, SSE, AVX-2 and AVX-512. ARM has their own called Advanced SIMD and SVE.

What is SSE instructions set?

SSE instructions are an extension of the SIMD execution model introduced with the MMX technology. SSE instructions are divided into four subgroups: SIMD single-precision floating-point instructions that operate on the XMM registers. MXSCR state management instructions.

What is SIMD Intel?

• SIMD (single instruction multiple data) architecture performs the same operation on. multiple data elements in parallel.


1 Answers

First, I don't recommend on using the built-in functions - they are not portable (across compilers of the same arch).

Use intrinsics, GCC does a wonderful job optimizing SSE intrinsics into even more optimized code. You can always have a peek at the assembly and see how to use SSE to it's full potential.

Intrinsics are easy - just like normal function calls:

#include <immintrin.h>  // portable to all x86 compilers  int main() {     __m128 vector1 = _mm_set_ps(4.0, 3.0, 2.0, 1.0); // high element first, opposite of C array order.  Use _mm_setr_ps if you want "little endian" element order in the source.     __m128 vector2 = _mm_set_ps(7.0, 8.0, 9.0, 0.0);      __m128 sum = _mm_add_ps(vector1, vector2); // result = vector1 + vector 2      vector1 = _mm_shuffle_ps(vector1, vector1, _MM_SHUFFLE(0,1,2,3));     // vector1 is now (1, 2, 3, 4) (above shuffle reversed it)     return 0; } 

Use _mm_load_ps or _mm_loadu_ps to load data from arrays.

Of course there are way more options, SSE is really powerful and in my opinion relatively easy to learn.

See also https://stackoverflow.com/tags/sse/info for some links to guides.

like image 134
LiraNuna Avatar answered Sep 28 '22 03:09

LiraNuna