How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

Question

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in parallel, but i read that autovectorization only works for loops.

I've read multiple times now that access of single elements in a vector via union or some other way should be avoided at all costs, instead should be replaced by a _mm_shuffle_pd (i'm working on doubles only)...

I don't seem to figure out how I can store the content of a __m128d vector as doubles without accessing it as a union. Also, does an operation like this give any performance gain when compared to scalar code?

union {
  __m128d v;
  double d[2];
} vec;
union {
  __m128d v;
double d[2];
} vec2;

vec.v = index1;
vec2.v = index2;
temp1 = _mm_mul_pd(temp1, _mm_set_pd(bvec[vec.d[1]], bvec[vec2[1]]));

also, the two unions look ridiculously ugly, but when using

union dvec {
  __m128d v;
  double d[2];
} vec;

Trying to declare the indexX as dvec, the compiler complained dvec is undeclared.

Tony The Lion · Accepted Answer

Unfortunately if you look at MSDN it says the following:

You should not access the __m128d fields directly. You can, however, see these types in the debugger. A variable of type __m128 maps to the XMM[0-7] registers.

I'm no expert in SIMD, however this tells me that what you're doing won't work as it's just not designed to.

EDIT:

I've just found this, and it says:

Use __m128, __m128d, and __m128i only on the left-hand side of an assignment, as a return value, or as a parameter. Do not use it in other arithmetic expressions such as "+" and ">>".

It also says:

Use __m128, __m128d, and __m128i objects in aggregates, such as unions (for example, to access the float elements) and structures.

So maybe you can use them, but only in unions. Seems contradictory to what MSDN says, however.

EDIT2:

Here is another interesting resource that describes with examples on how to use these SIMD types

In the above link, you'll find this line:

#include <math.h>
#include <emmintrin.h>
double in1_min(__m128d x)
{
    return x[0];
}

In the above we use a new extension in gcc 4.6 to access the high and low parts via indexing. Older versions of gcc require using a union and writing to an array of two doubles. This is cumbersome, and extra slow when optimization is turned off.

Ciro Santilli 新疆再教育营六四事件法轮功郝海东 · Answer

_mm_cvtsd_f64 + _mm_unpackhi_pd

For doubles:

#include <assert.h>

#include <x86intrin.h>

int main(void) {
    __m128d x = _mm_set_pd(1.5, 2.5);
    /* _mm_cvtsd_f64 + _mm_unpackhi_pd */
    assert(_mm_cvtsd_f64(x) == 2.5);
    assert(_mm_cvtsd_f64(_mm_unpackhi_pd(x, x)) == 1.5);
}

For floats, I have posted the following examples at How to convert a hex float to a float in C/C++ using _mm_extract_ps SSE GCC instrinc function

_mm_cvtss_f32 + _mm_shuffle_ps
_MM_EXTRACT_FLOAT

For ints you can use _mm_extract_epi32:

#include <assert.h>

#include <x86intrin.h>

int main(void) {
    __m128i x = _mm_set_epi32(1, 2, 3, 4);
    assert(_mm_extract_epi32(x, 3) == 4);
    assert(_mm_extract_epi32(x, 2) == 3);
    assert(_mm_extract_epi32(x, 1) == 1);
    assert(_mm_extract_epi32(x, 0) == 1);
}

GitHub upstream.

Compile and run examples with:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Tested on Ubuntu 19.04 amd64.

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

Tags:

c

x86

simd

intrinsics

sse2

the_toast

2 Answers

Tony The Lion

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Recent Activity

Donate For Us

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

Tags:

c

x86

simd

intrinsics

sse2

the_toast

2 Answers

Tony The Lion

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Related questions

Recent Activity

Donate For Us