Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to make bitwise operations in a C array

I have a C array like:

char byte_array[10];

And another one that acts as a mask:

char byte_mask[10];

I would like to do get another array that is the result from the first one plus the second one using a bitwise operation, on each byte.

What's the most efficient way to do this?

thanks for your answers.

like image 527
alvatar Avatar asked Mar 20 '09 22:03

alvatar


People also ask

How do you maximize bitwise and array?

Approach: Precompute the prefix and suffix OR arrays. In one iteration, multiply an element with x^k and do Bitwise OR it with prefix OR i.e. Bitwise OR of all previous elements and suffix OR i.e. Bitwise OR of all next elements and return the maximum value after all iterations.

How do you maximize Bitwise and operation?

Given a range [L, R], the task is to find a pair (X, Y) such that L ≤ X < Y ≤ R and X & Y is the maximum among all the possible pairs then print the bitwise AND of the found pair. In all the possible pairs, pair (8, 9) gives the maximum value for bitwise AND.

Are bitwise operations fast in C?

Yes, Bitwise operations are alot faster than any arithmetic operations because these operations are performed directly on the bits that is 0 and 1.

Are bitwise operations faster than addition?

On simple low-cost processors, typically, bitwise operations are substantially faster than division, several times faster than multiplication, and sometimes significantly faster than addition.


2 Answers

for ( i = 10 ; i-- > 0 ; )
    result_array[i] = byte_array[i] & byte_mask[i];
  • Going backwards pre-loads processor cache-lines.
  • Including the decrement in the compare can save some instructions.

This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.

For example, let's say n=16 instead of n=10. Then this would be much faster:

uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
    result32[i] = input32[i] & mask32[i];

(Of course you need a proper type for uint32_t, and if n is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)

Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

like image 157
Jason Cohen Avatar answered Sep 24 '22 20:09

Jason Cohen


If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:

char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
  ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}

This is much faster than doing it byte per byte.

(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

like image 33
Antti Huima Avatar answered Sep 22 '22 20:09

Antti Huima