SSE SIMD Optimization For Loop

Tags:

I have some code in a loop

for(int i = 0; i < n; i++)
{
  u[i] = c * u[i] + s * b[i];
}

So, u and b are vectors of the same length, and c and s are scalars. Is this code a good candidate for vectorization for use with SSE in order to get a speedup?

UPDATE

I learnt vectorization (turns out it's not so hard if you use intrinsics) and implemented my loop in SSE. However, when setting the SSE2 flag in the VC++ compiler, I get about the same performance as with my own SSE code. The Intel compiler on the other hand was much faster than my SSE code or the VC++ compiler.

Here is the code I wrote for reference

double *u = (double*) _aligned_malloc(n * sizeof(double), 16);
for(int i = 0; i < n; i++)
{
   u[i] = 0;
}

int j = 0;
__m128d *uSSE = (__m128d*) u;
__m128d cStore = _mm_set1_pd(c);
__m128d sStore = _mm_set1_pd(s);
for (j = 0; j <= i - 2; j+=2)
{
  __m128d uStore = _mm_set_pd(u[j+1], u[j]);

  __m128d cu = _mm_mul_pd(cStore, uStore);
  __m128d so = _mm_mul_pd(sStore, omegaStore);

  uSSE[j/2] = _mm_add_pd(cu, so);
}
for(; j <= i; ++j)
{
  u[j] = c * u[j] + s * omegaCache[j];
}

932

asked May 27 '10 03:05

Projectile Fish

1 Answers

Yes, this is an excellent candidate for vectorization. But, before you do so, make sure you've profiled your code to be sure that this is actually worth optimizing. That said, the vectorization would go something like this:

int i;
for(i = 0; i < n - 3; i += 4)
{
  load elements u[i,i+1,i+2,i+3]
  load elements b[i,i+1,i+2,i+3]
  vector multiply u * c
  vector multiply s * b
  add partial results
  store back to u[i,i+1,i+2,i+3]
}

// Finish up the uneven edge cases (or skip if you know n is a multiple of 4)
for( ; i < n; i++)
  u[i] = c * u[i] + s * b[i];

For even more performance, you can consider prefetching further array elements, and/or unrolling the loop and using software pipelining to interleave the computation in one loop with the memory accesses from a different iteration.

153

answered Sep 28 '22 02:09

Adam Rosenfield

Related questions
                            
                                How to use a C++ project from within a .NET application?
                            
                                Mixing /MD and /MT in single dll
                            
                                Cannot find C++ ATL Libraries (atl.lib and atl120.dll) in Visual Studio 2013
                            
                                Measuring performance of vector<unique_ptr> on VS2013?
                            
                                Why there is a underline before wtoi in function _wtoi which ansi version is atoi?
                            
                                Condition breakpoint on higher stack frames
                            
                                Compiler omitting outer loop
                            
                                Do I need to call CloseHandle?
                            
                                Different behaviour between CString operators "+=" and "+"
                            
                                Is it possible to change the working directory through VC++ property sheet?
                            
                                warning MSB8004: Output Directory does not end with a trailing slash.
                            
                                VC++ using fp:fast causes wrong (not just inaccurate) results - is this a compiler bug?
                            
                                How is value returned by lambda using a static local wrong in MSVC2017 15.9.3 with /std:c++17?
                            
                                Why does this incorrect std::function initialization compile using MSVC?
                            
                                Why does the MSVC _count_of implementation add 0 to the result of sizeof? [duplicate]
                            
                                Warning C4341 - 'XX': signed value is out of range for enum constant
                            
                                Using MFC in Windows service?
                            
                                Storing source files outside project file directory in Visual Studio C++ 2009
                            
                                Most portable and reliable way to get the address of variable in C++
                            
                                MFC : How to add tooltip in Cmenu items?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SSE SIMD Optimization For Loop

Tags:

visual-c++

simd

sse

Projectile Fish

People also ask

1 Answers

Adam Rosenfield

Recent Activity

Donate For Us