Could anyone tell me please what is an efficient algorithm to count the number of leading zeroes in a 32-bit unsigned integer in C programming?
This discussion assumes that your compiler either doesn't support the operation or that it doesn't produce good enough assembly. Note that both of these are unlikely nowadays so I'd recommend just using __builtin_clz
for gcc or equivalent on your compiler.
Note that determining which is the "best" clz algo can only be done by you. Modern processors are complicated beasts and the performance of these algorithm will heavily depend on the platform you run it on, the data you throw at it and the code that uses it. The only way to be sure is to measure, measure and measure some more. If you can't tell the difference then you're probably not looking at your bottleneck and your time will be better spent elsewhere.
Now that the boring disclaimers are out of the way, let's take a look at what Hacker's Delight has to say about the problem. A quick survey shows that all algorithms rely on a binary search of some description. Here's a straightforward example:
int n = 32;
unsigned y;
y = x >>16; if (y != 0) { n = n -16; x = y; }
y = x >> 8; if (y != 0) { n = n - 8; x = y; }
y = x >> 4; if (y != 0) { n = n - 4; x = y; }
y = x >> 2; if (y != 0) { n = n - 2; x = y; }
y = x >> 1; if (y != 0) return n - 2;
return n - x;
Note that this works on 32 ints and that it can also be transformed into an iterative version if needed. Unfortunately that solution doesn't have a whole lot of instruction-level parallelism and has quite a few branches which doesn't make for a very good bit twiddling algo. Note that a branch free version of the above code exists but it's much more verbose so I won't reproduce here.
So let's improve on the solution by using the pop instruction (counts the number of bits):
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >>16);
return pop(~x);
So how does this work? The key is the pop(~x)
instruction at the end which counts the numbers of zeros in x
. For the count of zeros to be meaningful we first need to get rid of all 0s which aren't leading. We do this by right propagating the 1s using a binary algorithm. While we still don't have much instruction-level parallelism, we did get rid of all the branches and it uses fewer cycles then the previous solution. Much better.
So how about that pop instruction, isn't that cheating? Most architecture have a 1 cycle pop instruction which can be accessed via compiler builtins (eg. gcc's __builtin_pop
). Otherwise there exist table based solutions but care must be taken when trading off cycles for cache accesses even if the table is kept entirely in the L1 cache.
Finally, as is usual for hacker's delight, we start wandering in strange territories. Let's count us some leading zeroes using floating point numbers:
union {
unsigned asInt[2];
double asDouble;
};
asDouble = (double)k + 0.5;
return 1054 - (asInt[LE] >> 20);
First off, a little warning: DON'T USE THIS ALGORITHM. This triggers undefined behaviour as far as the standard is concerned. This was reproduce for the fun factor more then any practical uses. Use at your own peril.
Now that the disclaimers are out of the way, how does it work? It first converts the int into a double and goes on to extract the exponent component of the double. Neat stuff. The LE constant should be 1 if executed on a little-endian machine and 0 on a big-endian machine.
This should give you a brief survey of various bit twiddling algorithms for this problem. Note that the book has several variations of these which makes various tradeoffs but I'll let you discover those on your own.
This is probably the optimal way to do it in pure C:
int clz(uint32_t x)
{
static const char debruijn32[32] = {
0, 31, 9, 30, 3, 8, 13, 29, 2, 5, 7, 21, 12, 24, 28, 19,
1, 10, 4, 14, 6, 22, 25, 20, 11, 15, 23, 26, 16, 27, 17, 18
};
x |= x>>1;
x |= x>>2;
x |= x>>4;
x |= x>>8;
x |= x>>16;
x++;
return debruijn32[x*0x076be629>>27];
}
One limitation: as written, it does not support an input of zero (where the result should be 32). If all your inputs are less than 0x80000000
, you can support zero with no additional cost by changing the first value in the table to 32. Otherwise, just add a line at the beginning:
if (!x) return 32;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With