How to find the position of the only-set-bit in a 64-bit value using bit manipulation efficiently?

People also ask

How do you determine a set bit position?

An alternate method to solve the problem is by using the shift operation to shift the number to right until it becomes 0. At the end the number of shifts done to reach 0 is the position of the set bit.

What is the bit position?

In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binary 1s place of the integer. Similarly, the most significant bit (MSB) represents the highest-order place of the binary integer.

How do I know if my bit is RIghtmost?

RIghtmost set bit can be easily found using 2's complement i.e. via (N & ~ (N - 1)) or using the XOR operator where “N” is the given number. Leftmost set bit can be easily found by simply right shifting the given number “N” till that number is > 0.

How do I find MSB and LSB?

In a binary number, the bit furthest to the left is called the most significant bit (msb) and the bit furthest to the right is called the least significant bit (lsb). The MSB gives the sign of the number (sign bit) , 0 for positive and 1 for negative.

Multiply the value by a carefully designed 64-bit constant, then mask off the upper 4 bits. For any CPU with fast 64-bit multiplication, this is probably as optimal as you can get.

int field_set(uint64_t input) {
    uint64_t field = input * 0x20406080a0c0e1ULL;
    return (field >> 60) & 15;
}

// field_set(0x0000000000000000ULL) = 0
// field_set(0x0000000000000080ULL) = 1
// field_set(0x0000000000008000ULL) = 2
// field_set(0x0000000000800000ULL) = 3
// field_set(0x0000000080000000ULL) = 4
// field_set(0x0000008000000000ULL) = 5
// field_set(0x0000800000000000ULL) = 6
// field_set(0x0080000000000000ULL) = 7
// field_set(0x8000000000000000ULL) = 8

clang implements this in three x86_64 instructions, not counting the frame setup and cleanup:

_field_set:
    push   %rbp
    mov    %rsp,%rbp
    movabs $0x20406080a0c0e1,%rax
    imul   %rdi,%rax
    shr    $0x3c,%rax
    pop    %rbp
    retq

Note that the results for any other input will be pretty much random. (So don't do that.)

I don't think there's any feasible way to extend this method to return values in the 7..63 range directly (the structure of the constant doesn't permit it), but you can convert the results to that range by multiplying the result by 7.

With regard to how this constant was designed: I started with the following observations:

Unsigned multiplication is a fast operation on most CPUs, and can have useful effects. We should use it. :)
Multiplying anything by zero results in zero. Since this matches with the desired result for a no-bits-set input, we're doing well so far.
Multiplying anything by 1ULL<<63 (i.e, your "pos=63" value) can only possibly result in the same value, or zero. (It cannot possibly have any lower bits set, and there are no higher bits to change.) Therefore, we must find some way for this value to be treated as the correct result.
A convenient way of making this value be its own correct result is by right-shifting it by 60 bits. This shifts it down to "8", which is a convenient enough representation. We can proceed to encode the other outputs as 1 through 7.
Multiplying our constant by each of the other bit fields is equivalent to left-shifting it by a number of bits equal to its "position". The right-shift by 60 bits causes only the 4 bits to the left of a given position to appear in the result. Thus, we can create all of the cases except for one as follows:
```
 uint64_t constant = (
      1ULL << (60 - 7)
    | 2ULL << (60 - 15)
    | 3ULL << (60 - 23)
    | 4ULL << (60 - 31)
    | 5ULL << (60 - 39)
    | 6ULL << (60 - 47)
    | 7ULL << (60 - 55)
 );
```

So far, the constant is 0x20406080a0c0e0ULL. However, this doesn't give the right result for pos=63; this constant is even, so multiplying it by that input gives zero. We must set the lowest bit (i.e, constant |= 1ULL) to get that case to work, giving us the final value of 0x20406080a0c0e1ULL.

Note that the construction above can be modified to encode the results differently. However, the output of 8 is fixed as described above, and all other output must fit into 4 bits (i.e, 0 to 15).

Here is a portable solution, that will, however, be slower than solutions taking advantage of specialized instructions such as clz (count leading zeros). I added comments at each step of the algorithm that explain how it works.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

/* return position of set bit, if exactly one of bits n*8-1 is set; n in [1,8]
   return 0 if no bit is set
*/
int bit_pos (uint64_t a)
{
    uint64_t t, c;
    t = a - 1; // create mask
    c = t >> 63; // correction for zero inputs
    t = t + c; // apply zero correction if necessary
    t = t & 0x0101010101010101ULL; // mark each byte covered by mask
    t = t * 0x0101010101010101ULL; // sum the byte markers in uppermost byte
    t = (t >> 53) - 1; // retrieve count and diminish by 1 for bit position
    t = t + c; // apply zero correction if necessary
    return (int)t;
}

int main (void)
{
    int i;
    uint64_t a;
    a = 0;
    printf ("a=%016llx   bit_pos=%2d   reference_pos=%2d\n", a, bit_pos(a), 0);
    for (i = 7; i < 64; i += 8) {
        a = (1ULL << i);
        printf ("a=%016llx   bit_pos=%2d   reference_pos=%2d\n", 
                a, bit_pos(a), i);
    }
    return EXIT_SUCCESS;
}

The output of this code should look like this:

a=0000000000000000   bit_pos= 0   reference_pos= 0
a=0000000000000080   bit_pos= 7   reference_pos= 7
a=0000000000008000   bit_pos=15   reference_pos=15
a=0000000000800000   bit_pos=23   reference_pos=23
a=0000000080000000   bit_pos=31   reference_pos=31
a=0000008000000000   bit_pos=39   reference_pos=39
a=0000800000000000   bit_pos=47   reference_pos=47
a=0080000000000000   bit_pos=55   reference_pos=55
a=8000000000000000   bit_pos=63   reference_pos=63

On an x86_64 platform, my compiler translates bit_pos() into this machine code:

bit_pos PROC 
        lea       r8, QWORD PTR [-1+rcx]
        shr       r8, 63
        mov       r9, 0101010101010101H
        lea       rdx, QWORD PTR [-1+r8+rcx]
        and       rdx, r9
        imul      r9, rdx
        shr       r9, 53
        lea       rax, QWORD PTR [-1+r8+r9]
        ret

[Later update]

The answer by duskwuff made it clear to me that my original thinking was unnecessarily convoluted. In fact, using duskwuff's approach, the desired functionality can be expressed much more concisely as follows:

/* return position of set bit, if exactly one of bits n*8-1 is set; n in [1,8]
   return 0 if no bit is set
*/
int bit_pos (uint64_t a)
{
    const uint64_t magic_multiplier = 
         (( 7ULL << 56) | (15ULL << 48) | (23ULL << 40) | (31ULL << 32) |
          (39ULL << 24) | (47ULL << 16) | (55ULL <<  8) | (63ULL <<  0));
    return (int)(((a >> 7) * magic_multiplier) >> 56);
}

Any reasonable compiler will precompute the magic multiplier, which is 0x070f171f272f373fULL. The code emitted for an x86_64 target shrinks to

bit_pos PROC 
        mov       rax, 070f171f272f373fH
        shr       rcx, 7
        imul      rax, rcx
        shr       rax, 56
        ret

If you can use POSIX, use the ffs() function from strings.h (not string.h!). It returns the position of the least significant bit set (one indexed) or a zero if the argument is zero. On most implementations, a call to ffs() is inlined and compiled into the corresponding machine instruction, like bsf on x86. The glibc also has ffsll() for long long arguments which should be even more suitable for your problem if available.

The value mod 0x8C yields a unique value for each of the cases.

This value mod 0x11 is still unique.

The second value in the table is the resulting mod 0x11.

128 9
32768   5
8388608 10
2147483648  0
549755813888    14
140737488355328 2
36028797018963968   4
9223372036854775808     15

So a simple lookup table will suffice.

int find_bit(uint64_t bit){ 
  int lookup[] = { the seventeen values };
  return lookup[ (bit % 0x8C) % 0x11];
}

No branching, no compiler tricks.

For completeness, the array is

{ 31, 0, 47, 15, 55, 0, 0, 7, 23, 0, 0, 0, 39, 63, 0, 0}

If you want an algorithm for the job rather than a built-in, this will do it. It yields the bit number of the most significant 1 bit, even if more than one bit is set. It narrows down the position by iteratively dividing the bit range under consideration into halves, testing whether there are any bits set in the upper half, taking that half as the new bit range if so, and otherwise taking the lower half as the new bit range.

#define TRY_WINDOW(bits, n, msb) do { \
    uint64_t t = n >> bits;           \
    if (t) {                          \
        msb += bits;                  \
        n = t;                        \
    }                                 \
} while (0)

int msb(uint64_t n) {
    int msb = 0;

    TRY_WINDOW(32, n, msb);
    TRY_WINDOW(16, n, msb);
    TRY_WINDOW( 8, n, msb);
    TRY_WINDOW( 4, n, msb);
    TRY_WINDOW( 2, n, msb);
    TRY_WINDOW( 1, n, msb);

    return msb;
}

C++ tag was removed, but here is a portable C++ answer nonetheless since you can compile it with C++ and use an extern C interface:

If you have a power of 2 and you subtract one you end up with a binary number with the number of set bits equal to the position

A way to count the number of set bits (binary 1s) is wrapped, presumably most efficiently by each implementation of the stl, in std::bitset member function count

Note that your specification has 0 returned for both 0 or 1, so I added as_specified_pos to meet this requirement. Personally I would just leave it return the natural value of 64 when passed 0 to be able to differentiate, and for the speed.

The following code should be extremely portable and most likely optimized per platform by compiler vendors:

#include <bitset>

uint64_t pos(uint64_t val)
{
   return std::bitset<64>(val-1).count();
}

uint64_t as_specified_pos(uint64_t val)
{
    return (val) ? pos(val) : 0;
}

On Linux with g++ I get the following disassembled code:

0000000000000000 <pos(unsigned long)>:
   0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
   4:   f3 48 0f b8 c0          popcnt %rax,%rax
   9:   c3                      retq
   a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

0000000000000010 <as_specified_pos(unsigned long)>:
  10:   31 c0                   xor    %eax,%eax
  12:   48 85 ff                test   %rdi,%rdi
  15:   74 09                   je     20 <as_specified_pos(unsigned long)+0x10>
  17:   48 8d 47 ff             lea    -0x1(%rdi),%rax
  1b:   f3 48 0f b8 c0          popcnt %rax,%rax
  20:   f3 c3                   repz retq

Modern hardware has specialized instructions for that (LZCNT, TZCNT on Intel processors).

Most compilers have intrinsics to easily generate them. See the following wikipedia page.

Related questions
                            
                                generate dependencies for a makefile for a project in C/C++
                            
                                How do I get the HMODULE for the currently executing code?
                            
                                Right shifting negative numbers in C
                            
                                Is there a built in swap function in C?
                            
                                How to write a simple Bittorrent application?
                            
                                Resolve circular typedef dependency?
                            
                                How do I declare several variables in a for (;;) loop in C?
                            
                                How do I free memory in C?
                            
                                Fastest way to scan for bit pattern in a stream of bits
                            
                                Header guards in C++ and C
                            
                                Default argument promotions in C function calls
                            
                                freeing a null pointer
                            
                                Carriage return in C?
                            
                                What is a "packed" structure in C?
                            
                                struct serialization in C and transfer over MPI
                            
                                Signal EOF in mac osx terminal
                            
                                Comma operator in c [duplicate]
                            
                                aligned malloc() in GCC?
                            
                                Seeking a simple description regarding 'file descriptor' after fork()
                            
                                Why does passing char** as const char** generate a warning?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find the position of the only-set-bit in a 64-bit value using bit manipulation efficiently?

Tags:

c

optimization

bit-manipulation

People also ask

Recent Activity

Donate For Us