I am looking for a portable way to generate prefix bitmasks which have the first n
bits set for 0 <= n <= 32
(or 64 or an arbitrary integer type bit width).
Examples:
prefix_bitmask(0) = 0b00000000000000000000000000000000u
prefix_bitmask(4) = 0b00000000000000000000000000001111u
prefix_bitmask(32) = 0b11111111111111111111111111111111u
There are two ways this can already work if we ignore the cases n == 0
or n == 32
:
// "constructive": set only the required bits
uint32_t prefix_mask1(int i) { return (uint32_t(1) << i) - 1; }
// "destructive": shift unneeded bits out
uint32_t prefix_mask2(int i) { return ~uint32_t(0) >> (32 - i); }
prefix_mask1
fails for 32 and prefix_mask2
fails for 0, both because shifts larger than the integer type are undefined behavior (because CPUs are allowed to only use the lowest 5 bits of the shift size).
Is there a "canonical" way to solve this without branching?
((uint32_t) 1 << i/2 << i-i/2) - 1
.
The above works where uint32_t
may be replaced with any unsigned type, and no other changes are needed. Other options that require knowing the number of bits b
in the type and a mask m
= 2b
−1 include:
((uint32_t) 1 << (i & m)) - 1 - (i >> b)
(from supercat)
and:
((uint32_t) i >> b) ^ 1) << (i & m)) - 1
(derived from a suggestion by Matt Timmermans).
This can be done using the prefix_mask2
idea with arithmetic shifts to prepare the correct pattern, with three instructions total (assuming shift counts in the CPU are modulo word width):
// minimal instruction dependency (2 cycles), but requires large constant
// that some architectures have trouble generating
uint32_t prefix_mask2a(int i) {
return ((int32_t) (i + (0x80000000 - 32))) >> ((i ^ 31) & 31);
}
// 3 cycles
uint32_t prefix_mask2b(int i) {
return (uint32_t) ((int32_t) -i >> 31) >> (-i & 31);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With