Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saturate short (int16) in C++

I am optimizing bottleneck code:

int sum = ........
sum = (sum >> _bitShift);

if (sum > 32000)
    sum = 32000; //if we get an overflow, saturate output
else if (sum < -32000)
    sum = -32000; //if we get an underflow, saturate output

short result = static_cast<short>(sum);

I would like to write the saturation condition as one "if condition" or even better with no "if condition" to make this code faster. I don't need saturation exactly at value 32000, any similar value like 32768 is acceptable.

According this page, there is a saturation instruction in ARM. Is there anything similar in x86/x64?

like image 783
Tomas Kubes Avatar asked Dec 28 '25 23:12

Tomas Kubes


1 Answers

I'm not at all convinced that attempting to eliminate the if statement(s) is likely to do any real good. A quick check indicates that given this code:

int clamp(int x) {
    if (x < -32768)
        x = -32768;
    else if (x > 32767)
        x = 32767;
    return x;
}

...both gcc and Clang produce branch-free results like this:

clamp(int):
  cmp edi, 32767
  mov eax, 32767
  cmovg edi, eax
  mov eax, -32768
  cmp edi, -32768
  cmovge eax, edi
  ret

You can do something like x = std::min(std::max(x, -32768), 32767);, but this produces the same sequence, and the source seems less readable, at least to me.

You can do considerably better than this if you use Intel's vector instructions, but probably only if you're willing to put a fair amount of work into it--in particular, you'll probably need to operate on an entire (small) vector of values simultaneously to accomplish much this way. If you do go that way, you usually want to take a somewhat different approach to the task than you seem to be taking right now. Right now, you're apparently depending on int being a 32-bit type, so you're doing the arithmetic on a 32-bit type, then afterwards truncating it back down to a (saturated) 16-bit value.

With something like AVX, you'd typically want to use an instruction like _mm256_adds_epi16 to take a vector of 16 values (of 16-bits apiece), and do a saturating addition on all of them at once (or, likewise, _mm256_subs_epi16 to do saturating subtraction).

Since you're writing C++, what I've given above are the names of the compiler intrinsics used in most current compilers (gcc, icc, clang, msvc) for x86 processors. If you're writing assembly language directly, the instructions would be vpaddsw and vpsubsw respectively.

If you can count on a really current processor (one that supports AVX 512 instructions) you can use them instead to operate on a vector of 32 16-bit values simultaneously.

like image 73
Jerry Coffin Avatar answered Dec 30 '25 11:12

Jerry Coffin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!