I have a C array like: <pre class="prettyprint"><code>char byte_array[10]; </code></pre> And another one that acts as a mask: <pre class="prettyprint"><code>char byte_mask[10]; </code></pre> I would like to do get another array that is the result from the first one plus the second one using a bitwise operation, on each byte. What's the most efficient way to do this? thanks for your answers.

<pre class="prettyprint"><code>for ( i = 10 ; i-- > 0 ; ) result_array[i] = byte_array[i] & byte_mask[i]; </code></pre> <ul> <li>Going backwards pre-loads processor cache-lines.</li> <li>Including the decrement in the compare can save some instructions.</li> </ul> This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation. For example, let's say <code>n=16</code> instead of <code>n=10</code>. Then this would be much faster: <pre class="prettyprint"><code>uint32_t* input32 = (uint32_t*)byte_array; uint32_t* mask32 = (uint32_t*)byte_mask; uint32_t* result32 = (uint32_t*)result_array; for ( i = 4 ; i-- > 0 ; ) result32[i] = input32[i] & mask32[i]; </code></pre> (Of course you need a proper type for <code>uint32_t</code>, and if <code>n</code> is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.) Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then: <pre class="prettyprint"><code>char byte_array[12]; char byte_mask[12]; /* Checks for proper alignment */ assert(((unsigned int)(void *)byte_array) & 3 == 0); assert(((unsigned int)(void *)byte_mask) & 3 == 0); for (i = 0; i < (10+3)/4; i++) { ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i]; } </code></pre> This is much faster than doing it byte per byte. (Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

What's the most efficient way to make bitwise operations in a C array

Q: Are bitwise operations fast in C?

Yes, Bitwise operations are alot faster than any arithmetic operations because these operations are performed directly on the bits that is 0 and 1.

Q: Are bitwise operations faster than addition?

On simple low-cost processors, typically, bitwise operations are substantially faster than division, several times faster than multiplication, and sometimes significantly faster than addition.

Tags:

performance

c

micro-optimization

I have a C array like:

char byte_array[10];

And another one that acts as a mask:

char byte_mask[10];

I would like to do get another array that is the result from the first one plus the second one using a bitwise operation, on each byte.

What's the most efficient way to do this?

thanks for your answers.

527

asked Mar 20 '09 22:03

alvatar

2 Answers

for ( i = 10 ; i-- > 0 ; )
    result_array[i] = byte_array[i] & byte_mask[i];

Going backwards pre-loads processor cache-lines.
Including the decrement in the compare can save some instructions.

This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.

For example, let's say n=16 instead of n=10. Then this would be much faster:

uint32_t* input32 = (uint32_t*)byte_array;
uint32_t* mask32 = (uint32_t*)byte_mask;
uint32_t* result32 = (uint32_t*)result_array;
for ( i = 4 ; i-- > 0 ; )
    result32[i] = input32[i] & mask32[i];

(Of course you need a proper type for uint32_t, and if n is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)

Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

157

answered Sep 24 '22 20:09

Jason Cohen

If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:

char byte_array[12];
char byte_mask[12];
/* Checks for proper alignment */
assert(((unsigned int)(void *)byte_array) & 3 == 0);
assert(((unsigned int)(void *)byte_mask) & 3 == 0);
for (i = 0; i < (10+3)/4; i++) {
  ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
}

This is much faster than doing it byte per byte.

(Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

answered Sep 22 '22 20:09

Antti Huima

Related questions
                            
                                C, check if a file exists without being able to read/write possible? [duplicate]
                            
                                Read from pipe line by line in C
                            
                                "invalid controlling predicate" compiler error using OpenMP
                            
                                variable length array folded to constant array
                            
                                why do we need cudaDeviceSynchronize(); in kernels with device-printf?
                            
                                typedef, structure and type compatibiliy
                            
                                How to copy contents of the const char* type variable?
                            
                                What are the historical reasons C languages have pre-increments and post-increments?
                            
                                How can I implement a gpsd client (in C) to get Latitude, Longitude and Altitude? [closed]
                            
                                Why empty functions aren't removed as dead code in LLVM IR?
                            
                                How do I pass a std::function object to a function taking a function pointer?
                            
                                C compiling - 'undefined reference to function' when trying to link object files
                            
                                Printing a Unicode Symbol in C
                            
                                javac "no source files" when using -h option
                            
                                Compile Haskell programs to C
                            
                                How exactly does the ?: operator work in C?
                            
                                How to access a Python global variable from C?
                            
                                how to use libxml2 to modify an existing xml file?
                            
                                trying to copy struct members to byte array in c
                            
                                How do I compile DOS programs on Debian?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With