Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do SSE2 intrinsics store results?

I'm moving the first steps into SSE2 in C++. Here's the intrinsic I'm learning right now:

__m128d _mm_add_pd (__m128d a, __m128d b)

The document says: Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.

But I never pass dst to that function. So how can it add two double I pass (via pointer) to a resulting array if I don't pass it?

like image 761
markzzz Avatar asked Nov 16 '25 16:11

markzzz


1 Answers

The intrinsic returns the result of the computation, so you can store it in a variable or use it as another parameter.

An important thing to note here is that most SIMD instructions don't operate directly on memory, but you need to explicitly load (_mm_load(u)_pd) and store (_mm_store(u)_pd) the double values as you would for example do in assembly. The intermediate values will most likely be stored in SSE registers, or if too many registers are in use, on the stack.

So if you wanted to sum up two double arrays, you would do something like

double a[N];
double b[N];
double c[N];
for (int i = 0; i < N; i += 2) {  // We load two doubles every time
    auto x = _mm_loadu_pd(a + i); // We don't know anything about alignment
    auto y = _mm_loadu_pd(b + i); // So I assume the load is unaligned
    auto sum = _mm_add_pd(x, y);  // Compute the vector sum
    _mm_storeu_pd(c + i, sum);    // The store is unaligned as well
}
like image 96
Tobias Ribizel Avatar answered Nov 19 '25 09:11

Tobias Ribizel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!