Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary Serialization of std::bitset

std::bitset has a to_string() method for serializing as a char-based string of 1s and 0s. Obviously, this uses a single 8 bit char for each bit in the bitset, making the serialized representation 8 times longer than necessary.
I want to store the bitset in a binary representation to save space. The to_ulong() method is relevant only when there are less than 32 bits in my bitset. I have hundreds.
I'm not sure I want to use memcpy()/std::copy() on the object (address) itself, as that assumes the object is a POD.

The API does not seem to provide a handle to the internal array representation from which I could have taken the address.

I would also like the option to deserialize the bitset from the binary representation.

How can I do this?

like image 776
Adi Shavit Avatar asked Mar 09 '11 19:03

Adi Shavit


People also ask

Is Bitset faster than vector bool?

So it seems under these conditions, bitset is faster than vector when the code is optimized, while vector actually comes out on top by a (very) small margin when it's not.

What is std :: Bitset?

template< std::size_t N > class bitset; The class template bitset represents a fixed-size sequence of N bits. Bitsets can be manipulated by standard logic operators and converted to and from strings and integers.

What is use of Bitset in C++?

Bitset represents a fixed-size sequence of N bits and stores values either 0 or 1. Zero means value is false or bit is unset and one means value is true or bit is set. Bitset class emulates space efficient array of boolean values, where each element occupies only one bit.


1 Answers

This is a possible approach based on explicit creation of an std::vector<unsigned char> by reading/writing one bit at a time...

template<size_t N>
std::vector<unsigned char> bitset_to_bytes(const std::bitset<N>& bs)
{
    std::vector<unsigned char> result((N + 7) >> 3);
    for (int j=0; j<int(N); j++)
        result[j>>3] |= (bs[j] << (j & 7));
    return result;
}

template<size_t N>
std::bitset<N> bitset_from_bytes(const std::vector<unsigned char>& buf)
{
    assert(buf.size() == ((N + 7) >> 3));
    std::bitset<N> result;
    for (int j=0; j<int(N); j++)
        result[j] = ((buf[j>>3] >> (j & 7)) & 1);
    return result;
}

Note that to call the de-serialization template function bitset_from_bytes the bitset size N must be specified in the function call, for example

std::bitset<N> bs1;
...
std::vector<unsigned char> buffer = bitset_to_bytes(bs1);
...
std::bitset<N> bs2 = bitset_from_bytes<N>(buffer);

If you really care about speed one solution that would gain something would be doing a loop unrolling so that the packing is done for example one byte at a time, but even better is just to write your own bitset implementation that doesn't hide the internal binary representation instead of using std::bitset.

like image 76
6502 Avatar answered Oct 04 '22 05:10

6502