Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE2 integer overflow checking

Tags:

When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed?

I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't see that happening. For example, _mm_getcsr() prints the same value in both cases below (8064):

#include <iostream>
#include <emmintrin.h>

using namespace std;

void main()
{
    __m128i a = _mm_set_epi32(1, 0, 0, 0);
    __m128i b = _mm_add_epi32(a, a);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << b.m128i_i32[3] << endl;

    __m128i c = _mm_set_epi32((1<<31)-1, 3, 2, 1);
    __m128i d = _mm_add_epi32(c, c);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << d.m128i_i32[3] << endl;
}

Is there some other way to check for overflow with SSE2?

like image 600
Igor ostrovsky Avatar asked May 09 '12 06:05

Igor ostrovsky


1 Answers

Here is a somewhat more efficient version of @hirschhornsalz's sum_and_overflow function:

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow)
{
   __v4si sa, sb;

    sum = _mm_add_epi32(a, b);                  // calculate sum
    sa = _mm_xor_si128(sum, a);                 // compare sign of sum with sign of a
    sb = _mm_xor_si128(sum, b);                 // compare sign of sum with sign of b
    overflow = _mm_and_si128(sa, sb);           // get overflow in sign bit
    overflow = _mm_srai_epi32(overflow, 31);    // convert to SIMD boolean (-1 == TRUE, 0 == FALSE)
}

It uses an expression for overflow detection from Hacker's Delight page 27:

sum = a + b;
overflow = (sum ^ a) & (sum ^ b);               // overflow flag in sign bit

Note that the overflow vector will contain the more conventional SIMD boolean values of -1 for TRUE (overflow) and 0 for FALSE (no overflow). If you only need the overflow in the sign bit and the other bits are "don't care" then you can omit the last line of the function, reducing the number of SIMD instructions from 5 to 4.

NB: this solution, as well as the previous solution on which it is based are for signed integer values. A solution for unsigned values will require a slightly different approach (see @Stephen Canon's answer).

like image 173
Paul R Avatar answered Oct 07 '22 11:10

Paul R