I have a question about the number of cycles needed for bitwise operation, or more precisely, the XOR operation. In my program, I have two 1D arrays of uint8_t variable with a fixed size of 8. I want to XOR both arrays and I was wondering what was the most effective way to do so. This is a code summarizing the options I have found :
int main() {
uint8_t tab[4] = {1,0,0,2};
uint8_t tab2[4] = {2,3,4,1};
/* First option */
uint8_t tab3[4] = {tab[0]^tab2[0], tab[1]^tab2[1], tab[2]^tab2[2], tab[3]^tab2[3]};
/* Second option */
uint32_t* t = tab;
uint32_t* t2 = tab2;
uint32_t t3 = *t ^ *t2;
uint8_t* tab4 = &t3;
/* Comparison */
printf("%d & %d\n", tab3[0], tab4[0]);
printf("%d & %d\n", tab3[1], tab4[1]);
printf("%d & %d\n", tab3[2], tab4[2]);
printf("%d & %d\n", tab3[3], tab4[3]);
return 0;
}
What is the best option from a cycle/byte point of view?
All the basic binary operations—and, or, xor, not—execute in one clock cycle (or less) on almost every processor architecture ever since the 1960s. I say "or less" because the overhead of fetching instructions, tracking ready registers, etc., may put the binary operation time into the noise.
To make the algorithm faster, it would be necessary to study the caching characteristics of the data.
Most any practical algorithm crunching with binary operations will be faster than the associated I/O. Hashing algorithms (like the SHA family) are probably the exception.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With