I'm trying to clamp a value between -127 and 127 on a Cortex-M based microcontroller.
I have two competing functions, one uses conditionals the other uses a branchless hack I found here.
// Using conditional statements
int clamp(int val) { return ((val > 127) ? 127 : (val < -127) ? -127 : val); }
// Using branchless hacks
int clamp(int val) {
val -= -127;
val &= (~val) >> 31;
val += -127;
val -= 127;
val &= val >> 31;
val += 127;
return val;
}
Now I know in some cases one of these methods might be faster than the other, and vise-versa but in general is it worth it to use the branchless technique seeing as it doesn't really matter to me which I use, they both will work just fine in my case?
A little background on the microcontroller, it's a ARM based microcontroller running at 90 MIPS with a 3 stage pipeline, fetch, decode and execute and it seems to have some sort of branch predictor but I couldn't dig up details.
ARM code (GCC 4.6.3 with -O3
):
clamp1:
mvn r3, #126
cmp r0, r3
movlt r0, r3
cmp r0, #127
movge r0, #127
bx lr
clamp2:
add r0, r0, #127
mvn r3, r0
and r0, r0, r3, asr #31
sub r0, r0, #254
and r0, r0, r0, asr #31
add r0, r0, #127
bx lr
Thumb code:
clamp1:
mvn r3, #126
cmp r0, r3
it lt
movlt r0, r3
cmp r0, #127
it ge
movge r0, #127
bx lr
clamp2:
adds r0, r0, #127
mvns r3, r0
and r0, r0, r3, asr #31
subs r0, r0, #254
and r0, r0, r0, asr #31
adds r0, r0, #127
bx lr
Both are branchless thanks to ARM's conditional execution design. I will bet you they are essentially comparable in performance.
Something to realize is the the ARM and x86 architectures are very different when it comes to branch instructions. Taking a jump clears the pipeline which can result in the expediture of a number of clock cycles just to 'get back to where you were' in terms of throughput.
To quote a pdf I downloaded the other day (pg14 of http://simplemachines.it/doc/arm_inst.pdf),
Conditional Execution
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With