An alternate method to solve the problem is by using the shift operation to shift the number to right until it becomes 0. At the end the number of shifts done to reach 0 is the position of the set bit.
In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binary 1s place of the integer. Similarly, the most significant bit (MSB) represents the highest-order place of the binary integer.
RIghtmost set bit can be easily found using 2's complement i.e. via (N & ~ (N - 1)) or using the XOR operator where “N” is the given number. Leftmost set bit can be easily found by simply right shifting the given number “N” till that number is > 0.
In a binary number, the bit furthest to the left is called the most significant bit (msb) and the bit furthest to the right is called the least significant bit (lsb). The MSB gives the sign of the number (sign bit) , 0 for positive and 1 for negative.
Multiply the value by a carefully designed 64-bit constant, then mask off the upper 4 bits. For any CPU with fast 64-bit multiplication, this is probably as optimal as you can get.
int field_set(uint64_t input) {
uint64_t field = input * 0x20406080a0c0e1ULL;
return (field >> 60) & 15;
}
// field_set(0x0000000000000000ULL) = 0
// field_set(0x0000000000000080ULL) = 1
// field_set(0x0000000000008000ULL) = 2
// field_set(0x0000000000800000ULL) = 3
// field_set(0x0000000080000000ULL) = 4
// field_set(0x0000008000000000ULL) = 5
// field_set(0x0000800000000000ULL) = 6
// field_set(0x0080000000000000ULL) = 7
// field_set(0x8000000000000000ULL) = 8
clang implements this in three x86_64 instructions, not counting the frame setup and cleanup:
_field_set:
push %rbp
mov %rsp,%rbp
movabs $0x20406080a0c0e1,%rax
imul %rdi,%rax
shr $0x3c,%rax
pop %rbp
retq
Note that the results for any other input will be pretty much random. (So don't do that.)
I don't think there's any feasible way to extend this method to return values in the 7..63 range directly (the structure of the constant doesn't permit it), but you can convert the results to that range by multiplying the result by 7.
With regard to how this constant was designed: I started with the following observations:
1ULL<<63
(i.e, your "pos=63" value) can only possibly result in the same value, or zero. (It cannot possibly have any lower bits set, and there are no higher bits to change.) Therefore, we must find some way for this value to be treated as the correct result.Multiplying our constant by each of the other bit fields is equivalent to left-shifting it by a number of bits equal to its "position". The right-shift by 60 bits causes only the 4 bits to the left of a given position to appear in the result. Thus, we can create all of the cases except for one as follows:
uint64_t constant = (
1ULL << (60 - 7)
| 2ULL << (60 - 15)
| 3ULL << (60 - 23)
| 4ULL << (60 - 31)
| 5ULL << (60 - 39)
| 6ULL << (60 - 47)
| 7ULL << (60 - 55)
);
So far, the constant is 0x20406080a0c0e0ULL
. However, this doesn't give the right result for pos=63
; this constant is even, so multiplying it by that input gives zero. We must set the lowest bit (i.e, constant |= 1ULL
) to get that case to work, giving us the final value of 0x20406080a0c0e1ULL
.
Note that the construction above can be modified to encode the results differently. However, the output of 8
is fixed as described above, and all other output must fit into 4 bits (i.e, 0 to 15).
Here is a portable solution, that will, however, be slower than solutions taking advantage of specialized instructions such as clz
(count leading zeros). I added comments at each step of the algorithm that explain how it works.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* return position of set bit, if exactly one of bits n*8-1 is set; n in [1,8]
return 0 if no bit is set
*/
int bit_pos (uint64_t a)
{
uint64_t t, c;
t = a - 1; // create mask
c = t >> 63; // correction for zero inputs
t = t + c; // apply zero correction if necessary
t = t & 0x0101010101010101ULL; // mark each byte covered by mask
t = t * 0x0101010101010101ULL; // sum the byte markers in uppermost byte
t = (t >> 53) - 1; // retrieve count and diminish by 1 for bit position
t = t + c; // apply zero correction if necessary
return (int)t;
}
int main (void)
{
int i;
uint64_t a;
a = 0;
printf ("a=%016llx bit_pos=%2d reference_pos=%2d\n", a, bit_pos(a), 0);
for (i = 7; i < 64; i += 8) {
a = (1ULL << i);
printf ("a=%016llx bit_pos=%2d reference_pos=%2d\n",
a, bit_pos(a), i);
}
return EXIT_SUCCESS;
}
The output of this code should look like this:
a=0000000000000000 bit_pos= 0 reference_pos= 0
a=0000000000000080 bit_pos= 7 reference_pos= 7
a=0000000000008000 bit_pos=15 reference_pos=15
a=0000000000800000 bit_pos=23 reference_pos=23
a=0000000080000000 bit_pos=31 reference_pos=31
a=0000008000000000 bit_pos=39 reference_pos=39
a=0000800000000000 bit_pos=47 reference_pos=47
a=0080000000000000 bit_pos=55 reference_pos=55
a=8000000000000000 bit_pos=63 reference_pos=63
On an x86_64 platform, my compiler translates bit_pos()
into this machine code:
bit_pos PROC
lea r8, QWORD PTR [-1+rcx]
shr r8, 63
mov r9, 0101010101010101H
lea rdx, QWORD PTR [-1+r8+rcx]
and rdx, r9
imul r9, rdx
shr r9, 53
lea rax, QWORD PTR [-1+r8+r9]
ret
[Later update]
The answer by duskwuff made it clear to me that my original thinking was unnecessarily convoluted. In fact, using duskwuff's approach, the desired functionality can be expressed much more concisely as follows:
/* return position of set bit, if exactly one of bits n*8-1 is set; n in [1,8]
return 0 if no bit is set
*/
int bit_pos (uint64_t a)
{
const uint64_t magic_multiplier =
(( 7ULL << 56) | (15ULL << 48) | (23ULL << 40) | (31ULL << 32) |
(39ULL << 24) | (47ULL << 16) | (55ULL << 8) | (63ULL << 0));
return (int)(((a >> 7) * magic_multiplier) >> 56);
}
Any reasonable compiler will precompute the magic multiplier, which is 0x070f171f272f373fULL
. The code emitted for an x86_64 target shrinks to
bit_pos PROC
mov rax, 070f171f272f373fH
shr rcx, 7
imul rax, rcx
shr rax, 56
ret
If you can use POSIX, use the ffs()
function from strings.h
(not string.h
!). It returns the position of the least significant bit set (one indexed) or a zero if the argument is zero. On most implementations, a call to ffs()
is inlined and compiled into the corresponding machine instruction, like bsf
on x86. The glibc also has ffsll()
for long long
arguments which should be even more suitable for your problem if available.
The value mod 0x8C yields a unique value for each of the cases.
This value mod 0x11 is still unique.
The second value in the table is the resulting mod 0x11.
128 9
32768 5
8388608 10
2147483648 0
549755813888 14
140737488355328 2
36028797018963968 4
9223372036854775808 15
So a simple lookup table will suffice.
int find_bit(uint64_t bit){
int lookup[] = { the seventeen values };
return lookup[ (bit % 0x8C) % 0x11];
}
No branching, no compiler tricks.
For completeness, the array is
{ 31, 0, 47, 15, 55, 0, 0, 7, 23, 0, 0, 0, 39, 63, 0, 0}
If you want an algorithm for the job rather than a built-in, this will do it. It yields the bit number of the most significant 1 bit, even if more than one bit is set. It narrows down the position by iteratively dividing the bit range under consideration into halves, testing whether there are any bits set in the upper half, taking that half as the new bit range if so, and otherwise taking the lower half as the new bit range.
#define TRY_WINDOW(bits, n, msb) do { \
uint64_t t = n >> bits; \
if (t) { \
msb += bits; \
n = t; \
} \
} while (0)
int msb(uint64_t n) {
int msb = 0;
TRY_WINDOW(32, n, msb);
TRY_WINDOW(16, n, msb);
TRY_WINDOW( 8, n, msb);
TRY_WINDOW( 4, n, msb);
TRY_WINDOW( 2, n, msb);
TRY_WINDOW( 1, n, msb);
return msb;
}
C++ tag was removed, but here is a portable C++ answer nonetheless since you can compile it with C++ and use an extern C
interface:
If you have a power of 2 and you subtract one you end up with a binary number with the number of set bits equal to the position
A way to count the number of set bits (binary 1
s) is wrapped, presumably most efficiently by each implementation of the stl, in std::bitset
member function count
Note that your specification has 0
returned for both 0
or 1
, so I added as_specified_pos
to meet this requirement. Personally I would just leave it return the natural value of 64
when passed 0
to be able to differentiate, and for the speed.
The following code should be extremely portable and most likely optimized per platform by compiler vendors:
#include <bitset>
uint64_t pos(uint64_t val)
{
return std::bitset<64>(val-1).count();
}
uint64_t as_specified_pos(uint64_t val)
{
return (val) ? pos(val) : 0;
}
On Linux with g++ I get the following disassembled code:
0000000000000000 <pos(unsigned long)>:
0: 48 8d 47 ff lea -0x1(%rdi),%rax
4: f3 48 0f b8 c0 popcnt %rax,%rax
9: c3 retq
a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000010 <as_specified_pos(unsigned long)>:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 74 09 je 20 <as_specified_pos(unsigned long)+0x10>
17: 48 8d 47 ff lea -0x1(%rdi),%rax
1b: f3 48 0f b8 c0 popcnt %rax,%rax
20: f3 c3 repz retq
Modern hardware has specialized instructions for that (LZCNT, TZCNT on Intel processors).
Most compilers have intrinsics to easily generate them. See the following wikipedia page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With