Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster approach to checking for an all-zero buffer in C?

Tags:

I am searching for a faster method of accomplishing this:

int is_empty(char * buf, int size) 
{
    int i;
    for(i = 0; i < size; i++) {
        if(buf[i] != 0) return 0;
    }
    return 1;
}

I realize I'm searching for a micro optimization unnecessary except in extreme cases, but I know a faster method exists, and I'm curious what it is.

like image 235
Rob Avatar asked Sep 29 '09 17:09

Rob


People also ask

How do you check if an array is zero?

To check if an array is empty or not, you can use the .length property. The length property sets or returns the number of elements in an array. By knowing the number of elements in the array, you can tell if it is empty or not. An empty array will have 0 elements inside of it.

What does zero buffer mean?

Explanation. Zero Buffer Credit Rate measures the percentage of time that the port was unable to send frames due to a lack of buffer credits at the port over a particular time interval.


1 Answers

On many architectures, comparing 1 byte takes the same amount of time as 4 or 8, or sometimes even 16. 4 bytes is normally easy (either int or long), and 8 is too (long or long long). 16 or higher probably requires inline assembly to e.g., use a vector unit.

Also, a branch mis-predictions really hurt, it may help to eliminate branches. For example, if the buffer is almost always empty, instead of testing each block against 0, bit-or them together and test the final result.


Expressing this is difficult in portable C: casting a char* to long* violates strict aliasing. But fortunately you can use memcpy to portably express an unaligned multi-byte load that can alias anything. Compilers will optimize it to the asm you want.

For example, this work-in-progress implementation (https://godbolt.org/z/3hXQe7) on the Godbolt compiler explorer shows that you can get a good inner loop (with some startup overhead) from loading two consecutive uint_fast32_t vars (often 64-bit) with memcpy and then checking tmp1 | tmp2, because many CPUs will set flags according to an OR result, so this lets you check two words for the price of one.

Getting it to compile efficiently for targets without efficient unaligned loads requires some manual alignment in the startup code, and even then gcc may not inline the memcpy for loads where it can't prove alignment.

like image 129
derobert Avatar answered Oct 19 '22 14:10

derobert