Are there any branch-less or similar hacks for clamping an integer to the interval of 0 to 255, or a double to the interval of 0.0 to 1.0? (Both ranges are meant to be closed, i.e. endpoints are inclusive.)
I'm using the obvious minimum-maximum check:
int value = (value < 0? 0 : value > 255? 255 : value);
but is there a way to get this faster -- similar to the "modulo" clamp value & 255
? And is there a way to do similar things with floating points?
I'm looking for a portable solution, so preferably no CPU/GPU-specific stuff please.
This is a trick I use for clamping an int to a 0 to 255 range:
/**
* Clamps the input to a 0 to 255 range.
* @param v any int value
* @return {@code v < 0 ? 0 : v > 255 ? 255 : v}
*/
public static int clampTo8Bit(int v) {
// if out of range
if ((v & ~0xFF) != 0) {
// invert sign bit, shift to fill, then mask (generates 0 or 255)
v = ((~v) >> 31) & 0xFF;
}
return v;
}
That still has one branch, but a handy thing about it is that you can test whether any of several ints are out of range in one go by ORing them together, which makes things faster in the common case that all of them are in range. For example:
/** Packs four 8-bit values into a 32-bit value, with clamping. */
public static int ARGBclamped(int a, int r, int g, int b) {
if (((a | r | g | b) & ~0xFF) != 0) {
a = clampTo8Bit(a);
r = clampTo8Bit(r);
g = clampTo8Bit(g);
b = clampTo8Bit(b);
}
return (a << 24) + (r << 16) + (g << 8) + (b << 0);
}
Note that your compiler may already give you what you want if you code value = min (value, 255)
. This may be translated into a MIN
instruction if it exists, or into a comparison followed by conditional move, such as the CMOVcc
instruction on x86.
The following code assumes two's complement representation of integers, which is usually a given today. The conversion from Boolean to integer should not involve branching under the hood, as modern architectures either provide instructions that can directly be used to form the mask (e.g. SETcc
on x86 and ISETcc
on NVIDIA GPUs), or can apply predication or conditional moves. If all of those are lacking, the compiler may emit a branchless instruction sequence based on arithmetic right shift to construct a mask, along the lines of Boann's answer. However, there is some residual risk that the compiler could do the wrong thing, so when in doubt, it would be best to disassemble the generated binary to check.
int value, mask;
mask = 0 - (value > 255); // mask = all 1s if value > 255, all 0s otherwise
value = (255 & mask) | (value & ~mask);
On many architectures, use of the ternary operator ?:
can also result in a branchless instruction sequences. The hardware may support select-type instructions which are essentially the hardware equivalent of the ternary operator, such as ICMP
on NVIDIA GPUs. Or it provides CMOV
(conditional move) as in x86, or predication as on ARM, both of which can be used to implement branch-less code for ternary operators. As in the previous case, one would want to examine the disassembled binary code to be absolutely sure the resulting code is without branches.
int value;
value = (value > 255) ? 255 : value;
In case of floating-point operands, modern floating-point units typically provide FMIN
and FMAX
instructions which map straight to the C/C++ standard math functions fmin()
and fmax()
. Alternatively fmin()
and fmax()
may be translated into a comparison followed by a conditional move. Again, it would be prudent to examine the generated code to make sure it is branchless.
double value;
value = fmax (fmin (value, 1.0), 0.0);
I use this thing, 100% branchless.
int clampU8(int val)
{
val &= (val<0)-1; // clamp < 0
val |= -(val>255); // clamp > 255
return val & 0xFF; // mask out
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With