Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting matching bits in C++

Tags:

c++

I'm trying to take two bitset objects, for example

a = 10010111
b = 01110010

and remove bits from both variables if they match in the same position/index. So we'd be left with

a = 100xx1x1 = 10011
b = 011xx0x0 = 01100

Is there any way to achieve this?

like image 526
eddiewastaken Avatar asked Jan 18 '17 13:01

eddiewastaken


People also ask

How do you check bit is set or not in C?

Bitwise AND Operator (&) is used to check whether a bit is SET (HIGH) or not SET (LOW) in C and C++ programming language. Bitwise AND Operator (&) is a binary operator, which operates on two operands and checks the bits, it returns 1, if both bits are SET (HIGH) else returns 0.

How do you compare bits?

Line up each number so the bits match up, then compare each of their bits that share a position. For each bit comparison, if either or both bits are 0, the value of the result at that bit-position is 0. If both values have a 1 at that position, the result also gets a 1 at that position.

How do you find the bit between two numbers?

Bit difference of a pair (x, y) is count of different bits at same positions in binary representations of x and y. For example, bit difference for 2 and 7 is 2. Binary representation of 2 is 010 and 7 is 111 ( first and last bits differ in two numbers).


2 Answers

Other answers have shown nice, idiomatic C++ ways of doing this. Unfortunately, they are going to be rather slow. Even AndyG's clever template-based solution, although it does do as much of the work as possible at compile time, still causes the compiler to generate a lot of code that must be executed at runtime.

If you care about speed and are targeting a processor that supports the BMI2 instruction set (which would be Intel Haswell and later, or AMD Excavator and later), then you can use the PEXT instruction, which performs a parallel bit extraction. This allows you to literally solve the entire problem in about two machine instructions.

Since you're not writing in assembly, you would use the corresponding intrinsic for the PEXT instruction, which is _pext_u32. In its basic form, the code is simple, readable, and extremely efficient:

#include <stdint.h>      // for uint32_t
#include <x86intrin.h>   // for _pext_u32()  [on MSVC, drop the 'x86']
void RemoveMatchingBits(uint32_t& a, uint32_t& b)
{
   const uint32_t mask = (a ^ b);
   a = _pext_u32(a, mask);
   b = _pext_u32(b, mask);
}

First, you bitwise-XOR the two values (a and b together). This will generate a mask, where each bit in the mask is set if the corresponding bit is set in either a or b, otherwise that bit is not set. This mask is then used as the basis for the bit extraction performed by _pext_u32. The same mask is used for both bit-extraction operations, so only a single XOR instruction is required. Each _pext_u32 intrinsic will compile to a PEXT instruction. So, aside from some MOV instructions to shuffle around values (which will depend on the compiler used to generate the code and whether this code is inlined), there are only three machine-code instructions required. Here's how contemporary versions of GCC and Clang compile the above function (MSVC and ICC emit code that is extremely similar):

RemoveMatchingBits(unsigned int&, unsigned int&):
    mov     eax, DWORD PTR [rdi]    // rdi contains a pointer to 'a'
    mov     edx, DWORD PTR [rsi]    // rsi contains a pointer to 'b'
    xor     edx, eax
    pext    eax, eax, edx
    mov     DWORD PTR [rdi], eax
    mov     eax, DWORD PTR [rsi]
    pext    eax, eax, edx
    mov     DWORD PTR [rsi], eax
    ret

As you can see, most of the extra instructions here are MOVs, mandated by the way that we've written the function to accept its arguments by-reference and modify those values in place. Tweaking how the function is written, and/or by getting the optimizer to inline it at the call site, will yield an even more efficient implementation.

If you want to use a std::bitset, just modify the code slightly. The to_ulong() member function allows you to access the raw bits for manipulation. Something like:

void RemoveMatchingBits(std::bitset<8>& a, std::bitset<8>& b)
{
   const std::bitset<8> mask = (a ^ b);
   a = _pext_u32(static_cast<uint32_t>(a.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
   b = _pext_u32(static_cast<uint32_t>(b.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
}

Note that this further decreases the efficiency of the generated code, given the need to deal with the std::bitset object. In particular, the to_ulong() member function has to detect and throw an exception in the case of overflow, and MSVC seems incapable of optimizing that check out, even though a std::bitset<8> cannot possibly overflow a 32-bit integer type. Oh well—the code will be fast enough, and no one said abstractions were completely free.


If you cannot compile assuming BMI2 support, you can check at runtime using the CPUID instruction (virtually all x86 compilers provide an intrinsic for this).

If it is not available, you are not targeting x86, or if you just don't want to worry about the complexity of run-time delegation, then you can fall back to an alternative bit-twiddling implementation. Specifically, what you want is a "compress" operation. Discussion and code for this is given in section 7–4 of Henry S. Warren, Jr.'s classic book, Hacker's Delight.

Here is a straightforward, loop-based implementation of "compress", adapted from Figure 7–9 in Hacker's Delight:

uint32_t compress(uint32_t value, uint32_t mask)
{
   uint32_t result = 0;
   uint32_t shift  = 0;
   uint32_t maskBit;
   do
   {
        maskBit = (mask & 1);
        result |= ((value & maskBit) << shift);
        shift  += maskBit;
        value >>= 1;
        mask  >>= 1;
    } while (mask != 0);
    return result;
}

This adequately simulates the PEXT instruction, but it isn't fast. The following code implements the same algorithm, but uses a faster "parallel suffix" method based on Figure 7–10 in Hacker's Delight:

uint32_t fallback_pext_u32(uint32_t value, uint32_t mask)
{
   const int log2BitSize = 5;                     // log_2 of the bit size (here, 32 bits)

   value &= mask;                                 // clear irrelevant bits    
   uint32_t mk = (~mask << 1);                    // we will count 0's to the right
   uint32_t mp;
   uint32_t mv;
   uint32_t t;
   for (int i = 0; i < log2BitSize; ++i)
   {
      mp     = mk ^ (mk <<  1);                   // parallel suffix
      mp     = mp ^ (mp <<  2);
      mp     = mp ^ (mp <<  4);
      mp     = mp ^ (mp <<  8);
      mp     = mp ^ (mp << 16);
      mv     = (mp & mask);                       // bits to move
      mask   = ((mask ^ mv) | (mv >> (1 << i)));  // compress mask
      t      = (value & mv);
      value  = ((value ^ t) | (t >> (1 << i)));   // compress value
      mk    &= ~mp;
   }
   return value;
}

This fallback implementation be slower than a single PEXT instruction, but it is completely branchless, so there won't be any hidden penalties for mispredicted branches when dealing with random input. You should get maximum possible throughput from your CPU here, but either way, it will certainly be much faster than a for loop with a series of conditional branches, as proposed by the other answers.

like image 91
Cody Gray Avatar answered Oct 04 '22 14:10

Cody Gray


You could use boost::dynamic_bitset<> for the result, then using push_back you can create the bitset dynamically.

#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <bitset>

int main()
{
    const int N = 8;
    boost::dynamic_bitset<> a_out(0);
    boost::dynamic_bitset<> b_out(0); 
    std::bitset<N>a(0x97); //10010111
    std::bitset<N>b(0x72); //01110010

    for (int i = 0; i < N; i++)
    {
        if (a[i] != b[i])
        {
            a_out.push_back(bool(a[i]));
            b_out.push_back(bool(b[i]));
        }
    }


    std::cout << a_out << "\n";
    std::cout << b_out << "\n";

    return 0;
}

Try here!

Output:
10011
01100

[EDITED] And if you want to optimize you can add this before the for loop(But you must to have boost 1.62 or newer to use reserve())

//@5gon12eder Optimization
const auto xorified = a ^ b;
const auto n = xorified.count();
a_out.reserve(n); 
b_out.reserve(n);

And inside the for loop compare bits as:

if (xorified[i]) { ... }
like image 23
Rama Avatar answered Oct 04 '22 14:10

Rama