I have a mass of data, maybe 4MB. Now want to check if all bits in it are 0.
Eg: Here is the data:
void* data = malloc(4*1024*1024); memset(data, 0, 4*1024*1024);
Check if all bits in it are 0. Here is my solution which is not fast enough:
int dataisnull(char* data, int length) { int i = 0; while(i<length){ if (data[i]) return 0; i++; } return 1; }
This code might have some things to improve in performance. For example, in 32/64 bits machine, checking 4/8 bytes at a time may be faster.
So I wonder what is the fastest way to do it?
In C or C++, there is no special method for comparing NULL values. We can use if statements to check whether a variable is null or not. Here we will see one program. We will try to open a file in read mode, that is not present in the system. So the function will return null value. We can check it using if statement.
Historically, C represented NULL as the number 0 (that is, 0x00). Nowadays it can get a bit more complicated, and varies by operating system. You can usually check for NULL using ptr == 0, but there are corner cases where this can cause an issue.
A simple if (ptr) tests whether ptr is TRUE. It will return FALSE if ptr is NULL, or if ptr is 0. The distinction doesn't matter in many cases, but be aware that these are not identical in all architectures. [1] The reverse of this is if (!ptr), which will return TRUE if ptr is FALSE. Set a pointer before checking for NULL.
If a function can return NULL, think about whether this is a possibility, and whether that would cause problems later in your code. Here's an example of the malloc function using the null check ( if (ptr)) to ensure it only handles pointers with valid values:
You can handle multiple bytes at a time and unroll the loop:
int dataisnull(const void *data, size_t length) { /* assuming data was returned by malloc, thus is properly aligned */ size_t i = 0, n = length / sizeof(size_t); const size_t *pw = data; const unsigned char *pb = data; size_t val; #define UNROLL_FACTOR 8 #if UNROLL_FACTOR == 8 size_t n1 = n - n % UNROLL_FACTOR; for (; i < n1; i += UNROLL_FACTOR) { val = pw[i + 0] | pw[i + 1] | pw[i + 2] | pw[i + 3] | pw[i + 4] | pw[i + 5] | pw[i + 6] | pw[i + 7]; if (val) return 0; } #endif val = 0; for (; i < n; i++) { val |= pw[i]; } for (i = n * sizeof(size_t); i < length; i++) { val |= pb[i]; } return val == 0; }
Depending on your specific problem, it might be more efficient to detect non zero values early or late:
val
accumulator and test only at the end.The unrolled version above is a compromise that tests for non zero values every 64 or 128 bytes depending on the size of size_t
.
Depending on your compiler and processor, you might get better performance by unrolling less or more. You could also use intrinsic functions available for your particular architecture to take advantage of vector types, but it would be less portable.
Note that the code does not verify proper alignment for the data
pointer:
malloc
or similar, hence properly aligned for any type.As always, benchmark different solutions to see if it makes a real difference. This function might not be a bottleneck at all, writing a complex function to optimize a rare case is counterproductive, it makes the code less readable, more likely to contain bugs and much less maintainable. For example, the assumption on data alignment may not hold if you change memory allocation scheme or if you use static arrays, the function may invoke undefined behavior then.
The following checks if the first byte is what you want, and all subsequent pairs of bytes are the same.
int check_bytes(const char * const data, size_t length, const char val) { if(length == 0) return 1; if(*data != val) return 0; return memcmp(data, data+1, length-1) ? 0 : 1; } int check_bytes64(const char * const data, size_t length, const char val) { const char * const aligned64_start = (char *)((((uintptr_t)data) + 63) / 64 * 64); const char * const aligned64_end = (char *)((((uintptr_t)data) + length) / 64 * 64); const size_t start_length = aligned64_start - data; const size_t aligned64_length = aligned64_end - aligned64_start; const size_t end_length = length - start_length - aligned64_length; if (!check_bytes(data, start_length, val)) return 0; if (!check_bytes(aligned64_end, end_length, val)) return 0; return memcmp(aligned64_start, aligned64_start + 64, aligned64_length-64) ? 0 : 1; }
A more elaborate version of this function should probably pass cache-line-aligned pointers to memcmp
, and manually check the remaining blocks(s) instead of just the first byte.
Of course, you will have to profile on your specific hardware to see if there is any speed benefit of this method vs others.
If anyone doubts whether this works, ideone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With