Which is faster on ARM?
*p++ = (*p >> 7) * 255;
or
*p++ = ((*p >> 7) << 8) - 1
Essentially what I'm doing here is taking an 8-bit word and setting it to 255 if >= 128, and 0 otherwise.
If p is char below statement is just an assignment to 255.
*p++ = ((*p >> 7) << 8) - 1
If p is int, then of course it is a different story.
You can use GCC Explorer to see how the assembly output looks like. Below is appearently what you get from Linaro's arm-linux-gnueabi-g++ 4.6.3 with -O2 -march=armv7-a flags;
void test(char *p) {
*p++ = (*p >> 7) * 255;
}
void test2(char *p) {
*p++ = ((*p >> 7) << 8) - 1 ;
}
void test2_i(int *p) {
*p++ = ((*p >> 7) << 8) - 1 ;
}
void test3(char *p) {
*p++ = *p >= 128 ? ~0 : 0;
}
void test4(char *p) {
*p++ = *p & 0x80 ? ~0 : 0;
}
creates
test(char*):
ldrb r3, [r0, #0] @ zero_extendqisi2
sbfx r3, r3, #7, #1
strb r3, [r0, #0]
bx lr
test2(char*):
movs r3, #255
strb r3, [r0, #0]
bx lr
test2_i(int*):
ldr r3, [r0, #0]
asrs r3, r3, #7
lsls r3, r3, #8
subs r3, r3, #1
str r3, [r0, #0]
bx lr
test3(char*):
ldrsb r3, [r0, #0]
cmp r3, #0
ite lt
movlt r3, #255
movge r3, #0
strb r3, [r0, #0]
bx lr
test4(char*):
ldrsb r3, [r0, #0]
cmp r3, #0
ite lt
movlt r3, #255
movge r3, #0
strb r3, [r0, #0]
bx lr
If you are not nitpicking best is always to check assembly of the generated code over such details. People tend to overestimate compilers, I agree most of the time they do great but I guess it is anyone's right to question generated code.
You should also be careful interpreting instructions, since they won't always match into cycle accurate listing due to core's architectural featuers like having out-of-order, super scalar deep pipelines. So it might not be always shortest sequence of instructions win.
Well, to answer the question in your title, on ARM, a SHIFT+SUB can be done in a single instruction with 1 cycle latenency, while a MUL usually has multiple cycle latency. So the shift will usually be faster.
To answer the implied question of what C code to write for this, generally you are best off with the simplest code that expresses your intent:
*p++ = *p >= 128 ? ~0 : 0; // set byte to all ones iff >= 128
or
*p++ = *p & 0x80 ? ~0 : 0; // set byte to all ones based on the MSB
this will generally get converted by the compiler into the fastest way of doing it, whether that is a shift and whatever, or a conditional move.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With