Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SSE instructions

I have a loop written in C++ which is executed for each element of a big integer array. Inside the loop, I mask some bits of the integer and then find the min and max values. I heard that if I use SSE instructions for these operations it will run much faster compared to a normal loop written using bitwise AND , and if-else conditions. My question is should I go for these SSE instructions? Also, what happens if my code runs on a different processor? Will it still work or these instructions are processor specific?

like image 383
Naveen Avatar asked Feb 25 '09 15:02

Naveen


People also ask

What are SSE instructions used for?

Short for Streaming SIMD Extensions, SSE, originally known as ISSE (Internet Streaming SIMD Extensions), are instructions for multimedia programs first used on the Pentium III. For example, with an SSE processor, the computer can perform MPEG2 decoding without needing a decoder card.

What is SSE instructions set?

SSE instructions are an extension of the SIMD execution model introduced with the MMX technology. SSE instructions are divided into four subgroups: SIMD single-precision floating-point instructions that operate on the XMM registers. MXSCR state management instructions.

What are SSE registers?

SSE stands for Streaming SIMD Extensions. It is essentially the floating-point equivalent of the MMX instructions. The SSE registers are 128 bits, and can be used to perform operations on a variety of data sizes and types. Unlike MMX, the SSE registers do not overlap with the floating point stack.

What is SSE in CPU?

Streaming SIMD Extensions (SSE) SSE is a process or technology that enables single instruction multiple data. Older processors only process a single data element per instruction. SSE enables the instruction to handle multiple data elements.


1 Answers

  1. SSE instructions are processor specific. You can look up which processor supports which SSE version on wikipedia.
  2. If SSE code will be faster or not depends on many factors: The first is of course whether the problem is memory-bound or CPU-bound. If the memory bus is the bottleneck SSE will not help much. Try simplifying your integer calculations, if that makes the code faster, it's probably CPU-bound, and you have a good chance of speeding it up.
  3. Be aware that writing SIMD-code is a lot harder than writing C++-code, and that the resulting code is much harder to change. Always keep the C++ code up to date, you'll want it as a comment and to check the correctness of your assembler code.
  4. Think about using a library like the IPP, that implements common low-level SIMD operations optimized for various processors.
like image 158
Niki Avatar answered Sep 22 '22 20:09

Niki