Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a portable alternative to C++ bitfields

Tags:

c++

bit-fields

There are many situations (especially in low-level programming), where the binary layout of the data is important. For example: hardware/driver manipulation, network protocols, etc.

In C++ I can read/write arbitrary binary structures using char* and bitwise operations (masks and shifts), but that's tedious and error-prone. Obviously, I try to limit the scope of these operations and encapsulate them in higher-level APIs, but it's still a pain.

C++ bitfields seem to offer a developer-friendly solution to this problem, but unfortunately their storage is implementation specific.

NathanOliver mentionned std::bitset which basically allows you to access individual bits of an integer with a nice operator[] but lacks accessors for multi-bit fields.

Using meta-programming and/or macros, it's possible to abstract the bitwise operations in a library. Since I don't want to reinvent the wheel, I'm looking for a (preferably STL or boost) library that does that.

For the record, I'm looking into this for a DNS resolver, but the problem and its solution should be generic.

Edit: short answer: it turns out bitfield's storage is reliable in practice (even if it's not mandated by the standard) since system/network libraries use them and yeild well behaved programs when compiled with mainstream compilers.

like image 477
Antoine Avatar asked Jul 30 '15 14:07

Antoine


4 Answers

From the C++14 standard (N3797 draft), section 9.6 [class.bit], paragraph 1:

Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]

Although notes are non-normative, every implementation I'm aware of uses one of two layouts: either big-endian or little endian bit order.

Note that:

  • You must specify padding manually. This implies that you must know the size of your types (e.g. by using <cstdint>).
  • You must use unsigned types.
  • The preprocessor macros for detecting the bit order are implementation-dependent.
  • Usually the bit order endianness is the same as the byte order endianness. I believe there is a compiler flag to override it, though, but I can't find it.

For examples, look in netinet/tcp.h and other nearby headers.

Edit by OP: for example tcp.h defines

struct
{
    u_int16_t th_sport;     /* source port */
    u_int16_t th_dport;     /* destination port */
    tcp_seq th_seq;     /* sequence number */
    tcp_seq th_ack;     /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
    u_int8_t th_x2:4;       /* (unused) */
    u_int8_t th_off:4;      /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
    u_int8_t th_off:4;      /* data offset */
    u_int8_t th_x2:4;       /* (unused) */
# endif
    // ...
}

And since it works with mainstream compilers, it means bitset's memory layout is reliable in practice.

Edit:

This is portable within one endianness:

struct Foo {
    uint16_t x: 10;
    uint16_t y: 6;
};

But this may not be because it straddles a 16-bit unit:

struct Foo {
    uint16_t x: 10;
    uint16_t y: 12;
    uint16_t z: 10;
};

And this may not be because it has implicit padding:

struct Foo {
    uint16_t x: 10;
};
like image 122
o11c Avatar answered Nov 16 '22 20:11

o11c


We have this in production code where we had to port MIPS code to x86-64

https://codereview.stackexchange.com/questions/54342/template-for-endianness-free-code-data-always-packed-as-big-endian

Works well for us.

It's basically a template without any storage, the template arguments specify the position of the relevant bits.

If you need multiple fields, you put multiple specializations of the template together in a union, together with an array of bytes to provide storage.

The template has overloads for assignment of value and a conversion operator to unsigned for reading the value.

In addition, if the fields are larger than a byte, they are stored in big-endian byte order, which is sometimes useful when implementing cross-platform protocols.

here's a usage example:

union header
{
    unsigned char arr[2];       // space allocation, 2 bytes (16 bits)

    BitFieldMember<0, 4> m1;     // first 4 bits
    BitFieldMember<4, 5> m2;     // The following 5 bits
    BitFieldMember<9, 6> m3;     // The following 6 bits, total 16 bits
};

int main()
{
    header a;
    memset(a.arr, 0, sizeof(a.arr));
    a.m1 = rand();
    a.m3 = a.m1;
    a.m2 = ~a.m1;
    return 0;
}
like image 5
CplusPuzzle Avatar answered Nov 16 '22 22:11

CplusPuzzle


It's simple to implement bit fields with known positions with C++:

template<typename T, int POS, int SIZE>
struct BitField {
    T *data;

    BitField(T *data) : data(data) {}

    operator int() const {
        return ((*data) >> POS) & ((1ULL << SIZE)-1);
    }

    BitField& operator=(int x) {
        T mask( ((1ULL << SIZE)-1) << POS );
        *data = (*data & ~mask) | ((x << POS) & mask);
        return *this;
    }
};

The above toy implementation allows for example to define a 12-bit field in a unsigned long long variable with

unsigned long long var;

BitField<unsigned long long, 7, 12> muxno(&var);

and the generated code to access the field value is just

0000000000000020 <_Z6getMuxv>:
  20:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax  ; Get &var
  27:   48 8b 00                mov    (%rax),%rax     ; Get content
  2a:   48 c1 e8 07             shr    $0x7,%rax       ; >> 7
  2e:   25 ff 0f 00 00          and    $0xfff,%eax     ; keep 12 bits
  33:   c3                      retq   

Basically what you'd have to write by hand

like image 5
6502 Avatar answered Nov 16 '22 20:11

6502


I have written an implementation of bit fields in C++ as a library header file. An example I give in the documentation is that, instead of writing this:

struct A
  {
    union
      {
        struct
          {
            unsigned x : 5;
            unsigned a0 : 2;
            unsigned a1 : 2;
            unsigned a2 : 2;
          }
        u;
        struct
          {
            unsigned x : 5;
            unsigned all_a : 6;
          }
        v;
      };
  };

// …

A x;
x.v.all_a = 0x3f;
x.u.a1 = 0;

you can write:

typedef Bitfield<Bitfield_traits_default<> > Bf;

struct A : private Bitfield_fmt
  {
    F<5> x;
    F<2> a[3];
  };

typedef Bitfield_w_fmt<Bf, A> Bwf;

// …

Bwf::Format::Define::T x;
BITF(Bwf, x, a) = 0x3f;
BITF(Bwf, x, a[1]) = 0;

There's an alternative interface, under which the last two lines of the above would change to:

#define BITF_U_X_BWF Bwf
#define BITF_U_X_BASE x
BITF(X, a) = 0x3f;
BITF(X, a[1]) = 0;

Using this implementation of bit fields, the traits template parameter gives the programmer a lot of flexibility. Memory is just processor memory by default, or it can be an abstraction, with the programmer providing functions to perform "memory" reads and writes. The abstracted memory is a sequence of elements of any unsigned integral type (chosen by the programmer). Fields can be laid out either from least-to-most or most-to-least significance. The layout of fields in memory can be the reverse of what they are in the format structure.

The implementation is located at: https://github.com/wkaras/C-plus-plus-library-bit-fields

(As you can see, I unfortunately was not able to fully avoid use of macros.)

like image 1
WaltK Avatar answered Nov 16 '22 21:11

WaltK