Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use SSE and SSE2 to make a 128-bit wide integer?

Tags:

assembly

sse

sse2

I'm looking to understand SSE2's capabilities a little more, and would like to know if one could make a 128-bit wide integer that supports addition, subtraction, XOR and multiplication?

like image 338
Erkling Avatar asked Aug 30 '12 15:08

Erkling


People also ask

How do you write a 128-bit integer?

DrinkMoreBoilingWater's blog. As an extension the integer scalar type __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. Simply write __int128 for a signed 128-bit integer, or unsigned __int128 for an unsigned 128-bit integer.

Is there a 128-bit integer limit?

The 128-bit data type can handle up to 31 significant digits (compared to 17 handled by the 64-bit long double).

What is a 128-bit integer?

In computer architecture, 128-bit integers, memory addresses, or other data units are those that are 128 bits (16 octets) wide. Also, 128-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size.


1 Answers

SIMD is meant to work on multiple small values at the same time, hence there won't be any carry over to the higher unit and you must do that manually. In SSE2 there's no carry flag but you can easily calculate the carry as carry = sum < a or carry = sum < b like this. Worse yet, SSE2 doesn't have 64-bit comparisons either, so you must use some workaround like the one here

Here is an untested, unoptimized C code based on the idea above:

inline bool lessthan(__m128i a, __m128i b){
    a = _mm_xor_si128(a, _mm_set1_epi32(0x80000000));
    b = _mm_xor_si128(b, _mm_set1_epi32(0x80000000));
    __m128i t = _mm_cmplt_epi32(a, b);
    __m128i u = _mm_cmpgt_epi32(a, b);
    __m128i z = _mm_or_si128(t, _mm_shuffle_epi32(t, 177));
    z = _mm_andnot_si128(_mm_shuffle_epi32(u, 245),z);
    return _mm_cvtsi128_si32(z) & 1;
}

inline __m128i addi128(__m128i a, __m128i b)
{
    __m128i sum = _mm_add_epi64(a, b);
    __m128i mask = _mm_set1_epi64(0x8000000000000000);    
    if (lessthan(_mm_xor_si128(mask, sum), _mm_xor_si128(mask, a)))
    {
        __m128i ONE = _mm_setr_epi64(0, 1);
        sum = _mm_add_epi64(sum, ONE);
    }

    return sum;
}

As you can see, the code requires many more instructions and even after optimizing it may still be much longer than a simple 2 ADD/ADC pair in x86_64 (or 4 instructions in x86)


SSE2 will help though, if you have multiple 128-bit integers to add in parallel. However you need to arrange the high and low parts of the values properly so that we can add all the low parts at once, and all the high parts at once

See also

  • practical BigNum AVX/SSE possible?
  • Can long integer routines benefit from SSE?
like image 117
phuclv Avatar answered Nov 11 '22 14:11

phuclv