What is the best (cleanest, most efficient) way to write saturating addition in C?
The function or macro should add two unsigned inputs (need both 16- and 32-bit versions) and return all-bits-one (0xFFFF or 0xFFFFFFFF) if the sum overflows.
Target is x86 and ARM using gcc (4.1.2) and Visual Studio (for simulation only, so a fallback implementation is OK there).
Unsigned saturation specifies that the value is then clipped to its range. In other words, values less than 0 are converted to 0, more than 255 to 255.
Saturation arithmetic is a version of arithmetic in which all operations such as addition and multiplication are limited to a fixed range between a minimum and maximum value.
The saturation number of H, written sat(n,H) is the minimum number of edges in an H-saturated graph with n vertices (assuming n≥|V(H)|). Background: Erdős, Hajnal, and Moon [EHM] proved the first result about saturation numbers: sat(n,Kt) = (t-2)[n-½(t-1)].
What is saturating arithmetic and how does it help? Saturating arithmetic eliminates the need for overflows and underflows on MMX instructions. If a result is out of bounds, it is saturated to the closest allowable value. Many multimedia applications benefit from saturating arithmetic by avoiding wraparound artifacts.
You probably want portable C code here, which your compiler will turn into proper ARM assembly. ARM has conditional moves, and these can be conditional on overflow. The algorithm then becomes: add and conditionally set the destination to unsigned(-1), if overflow was detected.
uint16_t add16(uint16_t a, uint16_t b) { uint16_t c = a + b; if (c < a) /* Can only happen due to overflow */ c = -1; return c; }
Note that this differs from the other algorithms in that it corrects overflow, instead of relying on another calculation to detect overflow.
x86-64 clang 3.7 -O3 output for adds32: significantly better than any other answer:
add edi, esi mov eax, -1 cmovae eax, edi ret
ARMv7: gcc 4.8 -O3 -mcpu=cortex-a15 -fverbose-asm
output for adds32:
adds r0, r0, r1 @ c, a, b it cs movcs r0, #-1 @ conditional-move bx lr
16bit: still doesn't use ARM's unsigned-saturating add instruction (UADD16
)
add r1, r1, r0 @ tmp114, a movw r3, #65535 @ tmp116, uxth r1, r1 @ c, tmp114 cmp r0, r1 @ a, c ite ls @ movls r0, r1 @,, c movhi r0, r3 @,, tmp116 bx lr @
In plain C:
uint16_t sadd16(uint16_t a, uint16_t b) { return (a > 0xFFFF - b) ? 0xFFFF : a + b; } uint32_t sadd32(uint32_t a, uint32_t b) { return (a > 0xFFFFFFFF - b) ? 0xFFFFFFFF : a + b; }
which is almost macro-ized and directly conveys the meaning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With