I'm trying to take two bitset
objects, for example
a = 10010111
b = 01110010
and remove bits from both variables if they match in the same position/index. So we'd be left with
a = 100xx1x1 = 10011
b = 011xx0x0 = 01100
Is there any way to achieve this?
Bitwise AND Operator (&) is used to check whether a bit is SET (HIGH) or not SET (LOW) in C and C++ programming language. Bitwise AND Operator (&) is a binary operator, which operates on two operands and checks the bits, it returns 1, if both bits are SET (HIGH) else returns 0.
Line up each number so the bits match up, then compare each of their bits that share a position. For each bit comparison, if either or both bits are 0, the value of the result at that bit-position is 0. If both values have a 1 at that position, the result also gets a 1 at that position.
Bit difference of a pair (x, y) is count of different bits at same positions in binary representations of x and y. For example, bit difference for 2 and 7 is 2. Binary representation of 2 is 010 and 7 is 111 ( first and last bits differ in two numbers).
Other answers have shown nice, idiomatic C++ ways of doing this. Unfortunately, they are going to be rather slow. Even AndyG's clever template-based solution, although it does do as much of the work as possible at compile time, still causes the compiler to generate a lot of code that must be executed at runtime.
If you care about speed and are targeting a processor that supports the BMI2 instruction set (which would be Intel Haswell and later, or AMD Excavator and later), then you can use the PEXT
instruction, which performs a parallel bit extraction. This allows you to literally solve the entire problem in about two machine instructions.
Since you're not writing in assembly, you would use the corresponding intrinsic for the PEXT
instruction, which is _pext_u32
. In its basic form, the code is simple, readable, and extremely efficient:
#include <stdint.h> // for uint32_t
#include <x86intrin.h> // for _pext_u32() [on MSVC, drop the 'x86']
void RemoveMatchingBits(uint32_t& a, uint32_t& b)
{
const uint32_t mask = (a ^ b);
a = _pext_u32(a, mask);
b = _pext_u32(b, mask);
}
First, you bitwise-XOR the two values (a
and b
together). This will generate a mask, where each bit in the mask is set if the corresponding bit is set in either a
or b
, otherwise that bit is not set. This mask is then used as the basis for the bit extraction performed by _pext_u32
. The same mask is used for both bit-extraction operations, so only a single XOR
instruction is required. Each _pext_u32
intrinsic will compile to a PEXT
instruction. So, aside from some MOV
instructions to shuffle around values (which will depend on the compiler used to generate the code and whether this code is inlined), there are only three machine-code instructions required. Here's how contemporary versions of GCC and Clang compile the above function (MSVC and ICC emit code that is extremely similar):
RemoveMatchingBits(unsigned int&, unsigned int&):
mov eax, DWORD PTR [rdi] // rdi contains a pointer to 'a'
mov edx, DWORD PTR [rsi] // rsi contains a pointer to 'b'
xor edx, eax
pext eax, eax, edx
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi]
pext eax, eax, edx
mov DWORD PTR [rsi], eax
ret
As you can see, most of the extra instructions here are MOV
s, mandated by the way that we've written the function to accept its arguments by-reference and modify those values in place. Tweaking how the function is written, and/or by getting the optimizer to inline it at the call site, will yield an even more efficient implementation.
If you want to use a std::bitset
, just modify the code slightly. The to_ulong()
member function allows you to access the raw bits for manipulation. Something like:
void RemoveMatchingBits(std::bitset<8>& a, std::bitset<8>& b)
{
const std::bitset<8> mask = (a ^ b);
a = _pext_u32(static_cast<uint32_t>(a.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
b = _pext_u32(static_cast<uint32_t>(b.to_ulong()), static_cast<uint32_t>(mask.to_ulong()));
}
Note that this further decreases the efficiency of the generated code, given the need to deal with the std::bitset
object. In particular, the to_ulong()
member function has to detect and throw an exception in the case of overflow, and MSVC seems incapable of optimizing that check out, even though a std::bitset<8>
cannot possibly overflow a 32-bit integer type. Oh well—the code will be fast enough, and no one said abstractions were completely free.
If you cannot compile assuming BMI2 support, you can check at runtime using the CPUID
instruction (virtually all x86 compilers provide an intrinsic for this).
If it is not available, you are not targeting x86, or if you just don't want to worry about the complexity of run-time delegation, then you can fall back to an alternative bit-twiddling implementation. Specifically, what you want is a "compress" operation. Discussion and code for this is given in section 7–4 of Henry S. Warren, Jr.'s classic book, Hacker's Delight.
Here is a straightforward, loop-based implementation of "compress", adapted from Figure 7–9 in Hacker's Delight:
uint32_t compress(uint32_t value, uint32_t mask)
{
uint32_t result = 0;
uint32_t shift = 0;
uint32_t maskBit;
do
{
maskBit = (mask & 1);
result |= ((value & maskBit) << shift);
shift += maskBit;
value >>= 1;
mask >>= 1;
} while (mask != 0);
return result;
}
This adequately simulates the PEXT
instruction, but it isn't fast. The following code implements the same algorithm, but uses a faster "parallel suffix" method based on Figure 7–10 in Hacker's Delight:
uint32_t fallback_pext_u32(uint32_t value, uint32_t mask)
{
const int log2BitSize = 5; // log_2 of the bit size (here, 32 bits)
value &= mask; // clear irrelevant bits
uint32_t mk = (~mask << 1); // we will count 0's to the right
uint32_t mp;
uint32_t mv;
uint32_t t;
for (int i = 0; i < log2BitSize; ++i)
{
mp = mk ^ (mk << 1); // parallel suffix
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = (mp & mask); // bits to move
mask = ((mask ^ mv) | (mv >> (1 << i))); // compress mask
t = (value & mv);
value = ((value ^ t) | (t >> (1 << i))); // compress value
mk &= ~mp;
}
return value;
}
This fallback implementation be slower than a single PEXT
instruction, but it is completely branchless, so there won't be any hidden penalties for mispredicted branches when dealing with random input. You should get maximum possible throughput from your CPU here, but either way, it will certainly be much faster than a for
loop with a series of conditional branches, as proposed by the other answers.
You could use boost::dynamic_bitset<>
for the result, then using push_back
you can create the bitset dynamically.
#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <bitset>
int main()
{
const int N = 8;
boost::dynamic_bitset<> a_out(0);
boost::dynamic_bitset<> b_out(0);
std::bitset<N>a(0x97); //10010111
std::bitset<N>b(0x72); //01110010
for (int i = 0; i < N; i++)
{
if (a[i] != b[i])
{
a_out.push_back(bool(a[i]));
b_out.push_back(bool(b[i]));
}
}
std::cout << a_out << "\n";
std::cout << b_out << "\n";
return 0;
}
Try here!
Output:
10011
01100
[EDITED]
And if you want to optimize you can add this before the for
loop(But you must to have boost 1.62 or newer to use reserve()
)
//@5gon12eder Optimization
const auto xorified = a ^ b;
const auto n = xorified.count();
a_out.reserve(n);
b_out.reserve(n);
And inside the for
loop compare bits as:
if (xorified[i]) { ... }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With