Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in parallel, but i read that autovectorization only works for loops.

I've read multiple times now that access of single elements in a vector via union or some other way should be avoided at all costs, instead should be replaced by a _mm_shuffle_pd (i'm working on doubles only)...

I don't seem to figure out how I can store the content of a __m128d vector as doubles without accessing it as a union. Also, does an operation like this give any performance gain when compared to scalar code?

union {
  __m128d v;
  double d[2];
} vec;
union {
  __m128d v;
double d[2];
} vec2;

vec.v = index1;
vec2.v = index2;
temp1 = _mm_mul_pd(temp1, _mm_set_pd(bvec[vec.d[1]], bvec[vec2[1]]));

also, the two unions look ridiculously ugly, but when using

union dvec {
  __m128d v;
  double d[2];
} vec;

Trying to declare the indexX as dvec, the compiler complained dvec is undeclared.

like image 527
the_toast Avatar asked Sep 19 '12 13:09

the_toast


2 Answers

Unfortunately if you look at MSDN it says the following:

You should not access the __m128d fields directly. You can, however, see these types in the debugger. A variable of type __m128 maps to the XMM[0-7] registers.

I'm no expert in SIMD, however this tells me that what you're doing won't work as it's just not designed to.

EDIT:

I've just found this, and it says:

Use __m128, __m128d, and __m128i only on the left-hand side of an assignment, as a return value, or as a parameter. Do not use it in other arithmetic expressions such as "+" and ">>".

It also says:

Use __m128, __m128d, and __m128i objects in aggregates, such as unions (for example, to access the float elements) and structures.

So maybe you can use them, but only in unions. Seems contradictory to what MSDN says, however.

EDIT2:

Here is another interesting resource that describes with examples on how to use these SIMD types

In the above link, you'll find this line:

#include <math.h>
#include <emmintrin.h>
double in1_min(__m128d x)
{
    return x[0];
}

In the above we use a new extension in gcc 4.6 to access the high and low parts via indexing. Older versions of gcc require using a union and writing to an array of two doubles. This is cumbersome, and extra slow when optimization is turned off.

like image 110
Tony The Lion Avatar answered Sep 20 '22 07:09

Tony The Lion


_mm_cvtsd_f64 + _mm_unpackhi_pd

For doubles:

#include <assert.h>

#include <x86intrin.h>

int main(void) {
    __m128d x = _mm_set_pd(1.5, 2.5);
    /* _mm_cvtsd_f64 + _mm_unpackhi_pd */
    assert(_mm_cvtsd_f64(x) == 2.5);
    assert(_mm_cvtsd_f64(_mm_unpackhi_pd(x, x)) == 1.5);
}

For floats, I have posted the following examples at How to convert a hex float to a float in C/C++ using _mm_extract_ps SSE GCC instrinc function

  • _mm_cvtss_f32 + _mm_shuffle_ps
  • _MM_EXTRACT_FLOAT

For ints you can use _mm_extract_epi32:

#include <assert.h>

#include <x86intrin.h>

int main(void) {
    __m128i x = _mm_set_epi32(1, 2, 3, 4);
    assert(_mm_extract_epi32(x, 3) == 4);
    assert(_mm_extract_epi32(x, 2) == 3);
    assert(_mm_extract_epi32(x, 1) == 1);
    assert(_mm_extract_epi32(x, 0) == 1);
}

GitHub upstream.

Compile and run examples with:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Tested on Ubuntu 19.04 amd64.